Browse Prior Art Database

Upper/Lower Case Decision for Character Readers

IP.com Disclosure Number: IPCOM000076801D
Original Publication Date: 1972-Apr-01
Included in the Prior Art Database: 2005-Feb-24
Document File: 2 page(s) / 45K

Publishing Venue

IBM

Related People

Baumgartner, RJ: AUTHOR

Abstract

In order to normalize characters to the proper height for subsequent recognition in character recognition machine, it is desirable to determine whether the character is upper case, lower case or a special symbol. Upper case includes all capital letters, all numerals, and the tall lower-case letters b, d, f, g, h, j, k, l, p, q, y. Lower case includes the remaining lower-case letters, except that i and t may be classed as upper case in some fonts. Special symbols include the period, comma, dash, etc.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 2

Upper/Lower Case Decision for Character Readers

In order to normalize characters to the proper height for subsequent recognition in character recognition machine, it is desirable to determine whether the character is upper case, lower case or a special symbol. Upper case includes all capital letters, all numerals, and the tall lower-case letters b, d, f, g, h, j, k, l, p, q, y. Lower case includes the remaining lower-case letters, except that i and t may be classed as upper case in some fonts. Special symbols include the period, comma, dash, etc.

Flowchart 100 shows the algorithm of a hardware or software module for determining the case of successive characters in a printed line on a document. Beginning at point "B", block 101 determines which characters are to be included in the computation of parameters against which the current character height is measured. This character group may conveniently include all previous characters on the same line, plus the succeeding ten characters. This often includes the entire line in the group.

Block 102 calculates a parameter MAX as a function of the heights H of the characters in the selected group. MAX is considered to be the normal upper- case character size. For each word in the character group, a maximum-height character is found. From this set of maxima, a minimum is chosen, and is called MAX. Partial words are included only if the group contains no whole words; single-character words are excluded. Block 104 calculates MAX1 as a predefined fraction of MAX. Block 105 calculates MAX2 as another fraction of MAX, but...