Browse Prior Art Database

Character Recognition Technique

IP.com Disclosure Number: IPCOM000077046D
Original Publication Date: 1972-Jun-01
Included in the Prior Art Database: 2005-Feb-24
Document File: 3 page(s) / 45K

Publishing Venue

IBM

Related People

McMains, LK: AUTHOR [+2]

Abstract

This character recognition technique allows a considerable reduction in both hardware and feature complexity with minimum effect on the performance levels.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 56% of the total text.

Page 1 of 3

Character Recognition Technique

This character recognition technique allows a considerable reduction in both hardware and feature complexity with minimum effect on the performance levels.

The reduction in character recognition hardware is accomplished by using information theory to predict the minimum number of tests required to resolve an unknown character. The average amount of information required to resolve an unknown character from a set of K characters is given by the following expression: H(X) </- log(2)K Bits over Character (1).

If a binary decision element is used as a test, then the maximum information gained per test is given by: H(T) </- log(2)2 = 1 Bit over Test (2). Equality in expression (1) is achieved if and only if all elements of the set (X) are equiprobable. By assuming that all elements are equiprobable, H(X) becomes the maximum average information required to solve an unknown character.

Equality in expression (2) is achieved if and only if the results of the test are equiprobable. Thus, it is necessary to select tests that have equally probable results in order to gain the maximum amount of information.

If it is assumed that equality holds in both expression (1) and (2), then an estimate of the minimum number of tests required to resolve one unknown character from a set of K characters, using binary tests, is given by:

(Image Omitted)

The equality results from the fact that log(2)K in general is a noninteger.

The technique has been applied to a test set of OCR A FONT numeric characters plus three special field separation characters. This does not, however, preclude its use on larger character sets or other fonts.

The minimum number of tests required to recognize a character from this set of 13 characters is given by the expression (3). Min # tests >/- log(2)13 = 3.7004. Thus, the minimum number of tests required is 4.

The recognition design proceeds as follows:
1) A suitable number of test characters are collected and

features are generated from the grid shown in Fig. 1.
2) A statistical tabulation is made and an initial

decision tree us generated as

shown above the dotted line in Fig. 2. (The left-hand

branch from each node indicates a negative decision at that

node, while the right-hand branch is taken for an

affirmative decision.)
3) The first level decision test is selected from

1

Page 2 of 3

these features under the

following rule: The feature must be present more than

99.5% of the time in characters...