Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Template Addition Method for Character Recognition

IP.com Disclosure Number: IPCOM000101820D
Original Publication Date: 1990-Sep-01
Included in the Prior Art Database: 2005-Mar-16
Document File: 5 page(s) / 157K

Publishing Venue

IBM

Related People

Katoh, S: AUTHOR [+2]

Abstract

Disclosed is a character recognition device that has template addition capability. The system automatically decides whether or not patterns of misread characters should be added to the standard template data set, without requiring the judgement of expert operators.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 44% of the total text.

Template Addition Method for Character Recognition

       Disclosed is a character recognition device that has
template addition capability. The system automatically decides
whether or not patterns of misread characters should be added to the
standard template data set, without requiring the judgement of expert
operators.

      1. Normalized Distance The recognition accuracy of an OCR
system can be improved by adding to the template data set template
records generated from the average of feature vectors that are
extracted from misread patterns in the same category. When the system
adds a new template record, however, it is necessary to evaluate the
advantages and disadvantages of the addition.

      For example, if some templates in the same category are located
in the neighborhood of a new template in a feature space, and no
templates in different categories are located in that neighborhood,
template addition is not expected to be very effective, because even
if the new template were not added, existing templates in the same
category would give correct recognition results.

      On the other hand, if no templates in the same category are
located in the neighborhood of a new template, and some templates in
different categories are located in that neighborhood, template
addition is expected to be effective, because if the new template
were not added, existing templates in different categories would give
some incorrect recognition results.

      To unify the measurement of advantages and disadvantages in a
template addition mechanism, we constructed a new concept using the
"normalized distance" between templates. We also established a new
relation between the normalized distance and the probability of
confusion error among categories in order to judge the advantages and
disadvantages of template addition.

      We define VA as a representative value of within-class
dispersion of category A.

                            (Image Omitted)

(1)
where m   : dimension of feature vector
       n   : total number of data in category A
           : template of category A ( i= 1 N m )
       x   : feature data for generating template
             in category A ( i=1 N m, j=1 N n )
      We define VAB as a representative value of inter-class
dispersion between categories A and B.
                                                          (2)
      The normalized distance dAB is defined as follows:
                                                          (3)

      To find the distribution of normalized distances between the
two categories, we calculated the normalized distances defined by
expression (3) for all sorts of combinations of Kanji characters
(3301 categories from Japanese Industrial Standard level 1 Chinese
characters). The r...