Browse Prior Art Database

Segmenting and Classifying Connected Variable Width Characters

IP.com Disclosure Number: IPCOM000044929D
Original Publication Date: 1983-Jan-01
Included in the Prior Art Database: 2005-Feb-06
Document File: 2 page(s) / 13K

Publishing Venue

IBM

Related People

Casey, RG: AUTHOR [+3]

Abstract

This invention relates to a method for segmenting and classifying connected variable-width characters. It is premised on the statistics of their being at least one non-joined character in the document which is included in a prototype library. The method operates on a character stream left to right and compares a window segment of the first character with the library. This is done recursively with a window segment of increasing size until a mismatch is attained. This indicates a character boundary. Small to large varying window segment comparisons are performed on the next character. If no match can be made, a new prototype is added to the library.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 2

Segmenting and Classifying Connected Variable Width Characters

This invention relates to a method for segmenting and classifying connected variable-width characters. It is premised on the statistics of their being at least one non-joined character in the document which is included in a prototype library. The method operates on a character stream left to right and compares a window segment of the first character with the library. This is done recursively with a window segment of increasing size until a mismatch is attained. This indicates a character boundary. Small to large varying window segment comparisons are performed on the next character. If no match can be made, a new prototype is added to the library.

Existing segmentation methods are "open-loop", that is, the segmenter chooses a separation point between characters and then goes on to the next pattern to be resolved. It is never required to change a decision, once made. Thus, segmentation errors can propagate, and one poor segmentation choice at the beginning of a sequence of touching characters can result in a chain of mutilated patterns. The method, on the other hand, is "closed-loop", and calls for the segmenter to go back and retry if a given segmentation choice leads to a poor result. The segmenter is united with a classifier in a feedback loop, as a consequence of which the segmentation operation is completed only when all components of a sequence of touching characters have been successfully recognized.

It is assumed that a presegmenter can isolate the text into patterns, each of which may or may not consist of a sequence of touching characters. The feasibility of such a presegmenter is guaranteed by the use of spaces of about character width between successive words in text. Thus, a presegmented pattern in the w...