Browse Prior Art Database

Coarse Fine OCR Segmentation

IP.com Disclosure Number: IPCOM000051452D
Original Publication Date: 1981-Jan-01
Included in the Prior Art Database: 2005-Feb-10
Document File: 1 page(s) / 12K

Publishing Venue

IBM

Related People

Casey, RG: AUTHOR

Abstract

An improved method for optical character recognition (OCR) character segmentation is described.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 81% of the total text.

Page 1 of 1

Coarse Fine OCR Segmentation

An improved method for optical character recognition (OCR) character segmentation is described.

The method utilizes an initial stage in which successive columns (vertical strips) of the scanned array are ORed in groups of one pitch width to yield a coarse line pattern (CLP) that crudely shows the distribution of white and black along the line. The CLP is analyzed to estimate baseline and line skew parameters by transforming the CLP by different trial line skews within a specified range. For every transformed CLP (XCLP), the number of black elements in each row is counted and the row-to-row change in this count is also calculated. The XCLP giving the maximum negative change (decrease) is assumed to have zero skew. The skew corrected row that gives the maximum gradient serves as the estimated baseline.

Successive pattern fields of the scanned array having unit pitch width are superposed (after skew correction) and summed. The resulting sum matrix tends to be sparse in the inter-character area. Thus, the column having minimum sum is recorded as an "average", or coarse, X-direction segmentation position.

Each character pattern is examined individually, with the known baseline (corrected for skew) and average segmentation column as references. A number of neighboring columns (3 columns, for example) to the left and right of the average segmentation columns are included in the view that is analyzed for full segmentation by conventional algo...