Browse Prior Art Database

Algorithm for Recognizing Low Quality Printed Characters

IP.com Disclosure Number: IPCOM000103646D
Original Publication Date: 1993-Jan-01
Included in the Prior Art Database: 2005-Mar-18
Document File: 2 page(s) / 84K

Publishing Venue

IBM

Related People

Paoli, C: AUTHOR [+3]

Abstract

Disclosed is a computerized approach for recognizing low quality printed characters, namely merged or broken characters. The approach utilizes a word-list, such as a computerized dictionary, in the language wherein the text is provided.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Algorithm for Recognizing Low Quality Printed Characters

       Disclosed is a computerized approach for recognizing low
quality printed characters, namely merged or broken characters.  The
approach utilizes a word-list, such as a computerized dictionary, in
the language wherein the text is provided.

      A computer program, such as SISTEMA L [1], recognizes normal
printed characters by comparing each of them with pre-stored models.
The recognition degree is expressed by a probability value.  For each
character more candidates are suggested with decreasing probability.
The word proposed by the program is obtained by a combination of the
suggested candidates, providing that the word is present in the
dictionary and has the highest degree of recognition, obtained as the
sum of the recognition degree of each character of the word [2].  If
a word is not found in the dictionary or its recognition degree is
too low, the word constituted by the candidates having each the
highest recognition degree is chosen, unless the presence of merged
or broken characters is detected.

      A character is assumed as formed by two or more merged
characters and marked by a special symbol, if it is larger than 1.5
times the average character horizontal dimension and with a
recognition degree below a given threshold.  The subdivision of the
image into sub-images corresponding to the different characters is
done using the information obtained from the dictionary used.  A
dictionary function [3] is invoked to produce a word-list obtained by
using the alternatives available for the normal characters and
whatever group of two or three characters instead of the special
symbol.  In this way only a few cases, corresponding to existing
words of the language, are considered.  The image of each too-large
character is then divided into sub-image...