Browse Prior Art Database

Character Preprocessing And Filtering for Convenience Amount Recognition

IP.com Disclosure Number: IPCOM000121728D
Original Publication Date: 1991-Sep-01
Included in the Prior Art Database: 2005-Apr-03
Document File: 4 page(s) / 150K

Publishing Venue

IBM

Related People

Bedell, RE: AUTHOR [+3]

Abstract

Disclosed is a technique for preprocessing and filtering the convenience amount character images to improve machine recognition of the data.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 49% of the total text.

Character Preprocessing And Filtering for Convenience Amount Recognition

      Disclosed is a technique for preprocessing and filtering
the convenience amount character images to improve machine
recognition of the data.

      Personal and Business checks have convenience amount areas
where the amount of the check is handwritten or printed in Arabic
numbers.  The convenience amount data is viewed and manually entered
by key-entry operators.  Machine recognition can reduce this keying
effort significantly.  In order to maximize the recognition rate the
characters must be correctly segmented before applying the
recognition logics.

      Traditionally, a dynamic thresholding algorithm is used to
convert the captured greyscale data to a black/white or binary image.
This binary image is used for character segmentation  and
recognition. Convenience amount boxes and sometimes the background
security pattern interfere with the data to be recognized.  The
thresholding parameters are normally optimized over the entire
document and are not optimized for recognition of the convenience
amount.  The typical problems encountered are:
      .    the box is not uniformly maintained,
      .    the box fragments interfere with the character, hence
making segmentation harder and unreliable, and
      .    there are breaks in the character strokes.
      Fig. 1 shows two examples.
 The objectives for preprocessing and filtering are as follows:
 1.   First, recognize the $ symbol and the box in order to locate
the convenience amount area.
 2.   Next, filter the $ and the box out so that the amount field can
be segmented without any interference, but maintain the desired
character pieces overlapping the box.
 3.   Maintain the characters without breaking up the character
strokes.

      This article solves the problem for a significant portion of
the personal and commercial check population, by using greyscale data
and its distribution observed on the document over a fairly wide area
containing the convenience amount field and thresholding based on the
observed distribution.

      Fig. 2 shows the typical greyscale distributions over a fairly
wide area containing the convenience amount.  This is obtained on
each document, and the modes of the distributions are determined;
hence the $, character and the box data are appropriately clustered.
This is done quite easily even if there is overlap in the cluster
distributions.  The $ symbol being the darkest, using a low threshold
preserves just the $, filtering out everything else.  Using a tight $
recognition logic, the presence of the $ is established reliably and
hence the convenience amount field area is precisely located.

      Next, using an appropriate box threshold, the box outline and
the coordinates are established so that the box outline can be
eliminated.

      Next,  knowing greyscale range for the characters inside the
box, th...