Browse Prior Art Database

Automatic Font Selection for Character Recognition

IP.com Disclosure Number: IPCOM000039808D
Original Publication Date: 1987-Aug-01
Included in the Prior Art Database: 2005-Feb-01
Document File: 2 page(s) / 14K

Publishing Venue

IBM

Related People

Mano, T: AUTHOR [+3]

Abstract

This article describes an automatic font selection algorithm for optical character recognition in which an variance of distances between centers of segmented characters in one character row and an average width and height of the characters are detected to select a decision tree of various fonts which is used for recognizing the characters in the character row. Various fonts for typewriters are classified in the following five groups: 10 Pitch group ....... Courier 10 Pica 10 Prestige Pica 10 Titan 10 12 Pitch group ....... Courier 12 Elite 12 Prestige Elite 12 OCR B 12 Letter Gothic group .... Letter Gothic 12 Orator group .........

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 58% of the total text.

Page 1 of 2

Automatic Font Selection for Character Recognition

This article describes an automatic font selection algorithm for optical character recognition in which an variance of distances between centers of segmented characters in one character row and an average width and height of the characters are detected to select a decision tree of various fonts which is used for recognizing the characters in the character row. Various fonts for typewriters are classified in the following five groups: 10 Pitch group .......

Courier 10 Pica 10

Prestige Pica 10

Titan 10 12 Pitch group .......
Courier 12

Elite 12

Prestige Elite 12

OCR B 12 Letter Gothic group ....

Letter Gothic 12 Orator group ......... Orator Presenter Proportional group ..... Bold

Cubic/Triad

Roman

Title It is assumed that a document includes three character rows, with the first and second character rows being typed by the font Pica 10, and with the third character row being typed by the font Bold. The algorithm for recognizing the document includes the following steps: STEP 1 The algorithm segments the character images of the first character row. The segmentation means the break up of the scanned image of the document into separate, distinct images of each character. STEP 2 The algorithm detects the distances between centers of the segmented characters of the character row, and detects a variance of the distances. The variance of the proportional group is relatively large, while the variance of the 10 Pitch, the 12 Pitch, the Letter Gothic and the Orator groups are relatively small. The purpose of the detection of the variance is to classify the font of the character row into the former one group and the latter four groups. The algorithm detects the relatively small variance since the first character row in the exemplary case is typed by the font Pica 10 of the 10 Pitch group rather than the Proportional group, the algorithm assigns the Proportional group with the lowest priority among the five groups,
i.e., priority 5. If the algorithm detects the relatively large variance, it assigns the Proportional group with priority 1. STEP 3 The algorithm detects average width and height of characters in the character row, whose widths and heights are larger than a predetermined size, such as 10 x 20 dots. The size is selected to exclude small symbols, such as "." (period) "," (comma), etc.. The purpose of step 3 is to assign the remaining four groups with their respective priority depending upon differences in both the width and the height of the average sizes among the four groups. The algorithm assigns the following priority 1 through 4 in the exemplary case, and completes the assignment of the priority 1 through 5.

1

Page 2 of 2

Priority 1 ..... 10 Pitch group

Priority 2 ..... 12 Pitch group

Priority 3 ..... Orator group

Priority 4 ..... Letter Gothic group

Priority 5 ..... Proportional group STEP 4 The purpose of step
4 is to recogniz...