Fast Method of Correcting Substitution Errors in Optical Character Recognition
Original Publication Date: 1991-Mar-01
Included in the Prior Art Database: 2005-Apr-02
This article describes an efficient method of reducing the number of candidate words in dictionary-aided recovery from errors in optical character recognition (OCR).
Fast Method of Correcting Substitution Errors in Optical
describes an efficient method of reducing
the number of candidate words in dictionary-aided recovery from
errors in optical character recognition (OCR).
The method transforms each recognized
candidate into numbers
designating the group to which the candidate character belongs, and
selects the candidate words according to the similarities between the
number lattice and words in a dictionary. This method can be easily
applied to large character sets, such as Kanji.
characters are classified into a suitable number of groups
based on the feature vectors of their templates in recognition
dictionary by clustering. Table 1 shows an example in which 2805
templates of Kanji characters are clustered into 123 groups. Similar
(easily confusable) characters have a tendency to be classified into
the same group. If an identifier (name) is given to each group,
words can be expressed by the cluster names:
important point is that it is possible to define
similarities between the 123 groups by computing the distances
between the centers of their elements. All the words in the word
dictionary are expressed by these cluster names, and classified
procedures, a dictionary is made in which all the
words are indexed by their cluster-name strings, and a similarity
table of cluster names is obtained. This dictionary and table, are
used to select candidate words as follows (Fig. 1). Let the
recognition result of OCR be C(i,j) (i and j denote the column
position and the order of the candidate, respectively).
Replace each candidate (C(i,j)) with the clust...