Browse Prior Art Database

Arbitrary Size Font Recongnition Algorithm in Optical Character Recognition

IP.com Disclosure Number: IPCOM000109390D
Original Publication Date: 1992-Aug-01
Included in the Prior Art Database: 2005-Mar-24
Document File: 4 page(s) / 148K

Publishing Venue

IBM

Related People

Mano, T: AUTHOR

Abstract

Disclosed are three algorithms of the arbitrary size font recognition in optical character recognition using a single Multiple Decision Tree (1). These algorithms are: (1) how to create a Multiple Decision Tree (MDT) for the arbitrary size font recognition, (2) how to recognize characters using the MDT created in 1, and (3) how to recognize touched characters. 1. Creation of a MDT for arbitrary size font recognition

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Arbitrary Size Font Recongnition Algorithm in Optical Character Recognition

       Disclosed are three algorithms of the arbitrary size font
recognition in optical character recognition using a single Multiple
Decision Tree (1).  These algorithms are: (1) how to create a
Multiple Decision Tree (MDT) for the arbitrary size font recognition,
(2) how to recognize characters using the MDT created in 1, and (3)
how to recognize touched characters.
1.  Creation of a MDT for arbitrary size font recognition

      A conventional method of MDT creation is described in (2).
Roughly speaking, sample images of characters ('A' to 'Z', 'a' to
'z', '0' to '9', and special characters) are gathered using an
optical scanner and a MDT is created from these images by some
statistical calculations.

      In the algorithm disclosed here, each character in sample
images is scaled so that the height of each character is equal to a
predefined value (say H in this article).  For example, assume a
character in an image has the width w and height h.  Then the
character is scaled by the scaling factor H/h and the scaled
character has the width w*H/h and the height H.  After all characters
in the sample images are scaled, a MDT is created using the ordinary
method. Moreover, the following information is stored with the
created MDT:
      (a) The average width of each scaled character.
      (b) An average image of each scaled character.
      (c) Category of each character.

      The category of a character is defined as follows.  A character
is classified into the category 1 if it does not stick out below the
base line and it is not high (example: a, c, e, etc.) A character is
classified into the category 2 if it does not stick out below the
base line and it is high (example:  b, d, h, etc.)  A character is
classified into the category 3 if it sticks out below the base line
and it is not high (example:  g, p, q, etc.), and a character is
classified into the category 4 if it sticks out below the base line
and it is high (example: j, etc.) It should be noticed that the
normal 'f' belongs to the category 2, but the italics 'f' belongs to
the category 3.
2. How to recognize characters using the MDT created in 1.

      The only difference between the ordinary method in (1) and the
method disclosed here is coordinates of pixels to be examined.
Assume an image of a character to be recognized has the width w and
the height h.  In the MDT, a coordinate of a pixel in the image to
be examined is recorded (say (a,b)).  Then, a pixel at (a*h/H,b*h/H)
is examined.

      At the same time, the scaling factor h/H is stored in a table.
The table consists of the average used scaling factors process of
characters in each category:
             ...