Browse Prior Art Database

Method of Using Frequency for Character Recognition

IP.com Disclosure Number: IPCOM000109744D
Original Publication Date: 1992-Sep-01
Included in the Prior Art Database: 2005-Mar-24
Document File: 3 page(s) / 102K

Publishing Venue

IBM

Related People

Itoh, N: AUTHOR

Abstract

This article describes an efficient method of using frequency for character recognition to improve the proportion of cases in which the correct answer is included in the candidates.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Method of Using Frequency for Character Recognition

       This article describes an efficient method of using
frequency for character recognition to improve the proportion of
cases in which the correct answer is included in the candidates.

      One useful item of information for achieving better recognition
is the frequency of each character.  In speech recognition, the
introduction of the frequency is quite usual.  In contrast, it is
rarely used in character recognition, for the following two reasons:
      1)   The domain of documents to be recognized is broad.  It is
therefore difficult to determine the appropriate frequency, which
depends on the domain.
      2)   It is necessary to obtain P(X C) (C is character category)
for an input feature X, but it is difficult to estimate this
parameter by using a simple model.  This is a serious problem,
particularly for Japanese, which has a large character set.

      Use of frequency in character recognition is formulated as
follows: Let a feature vector extracted from an input image be X, and
let the character category be C.  Character recognition is then a
problem of finding the value of C for which P(C X) is maximum.  From
Bayesian theorem,
      P(C X) = P(C) x P(X C)/P(X). (x denotes multiplication)
The problem is equivalent to finding the value of C for which P(C)P(X
C) is maximum, because P(X) is constant to an input.  As I mentioned
above, it is not easy to estimate P(X C).  We therefore try to
estimate P(d C), where d denotes the distance between X and the
template of C.  In other words, we obtain the candidates Ci by
employing recognition without taking account of the frequency and
then reconsider these candidates by using the formula:
   P(Ci di) = (P(Ci) x P(di Ci))/P(di)               (1)

      We should note that P(di) is not constant to an input.  In
order to reduce the dependence of each term on the character
category, we use d'i (...