Browse Prior Art Database

Method for Segmenting Defective Machine Print Characters

IP.com Disclosure Number: IPCOM000113660D
Original Publication Date: 1994-Sep-01
Included in the Prior Art Database: 2005-Mar-27

Publishing Venue

IBM

Related People

Will, TA: AUTHOR

Abstract

Disclosed is a method for segmenting defective machine print characters on financial document images. In this case, segmenting refers to the separation of a string of characters from a document into individual characters. Many defects or imperfections can be found within these strings of characters. The defects may be due to the printing process used to print the document, or to the handling of the document, or due to the imaging of the document. Some of the defects are: o Split characters o Touching characters o Connected characters due to smudges or stray marks o Improperly spaced characters

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 26% of the total text.

Method for Segmenting Defective Machine Print Characters

      Disclosed is a method for segmenting defective machine print
characters on financial document images.  In this case, segmenting
refers to the separation of a string of characters from a document
into individual characters.  Many defects or imperfections can be
found within these strings of characters.  The defects may be due to
the printing process used to print the document, or to the handling
of the document, or due to the imaging of the document.  Some of the
defects are:
  o   Split characters
  o   Touching characters
  o   Connected characters due to smudges or stray marks
  o   Improperly spaced characters

      Besides having these defects, very often the character spacing,
or period, of the characters within a character string is not known
exactly, but is known only to fall within a particular range.

      This disclosure describes a segmentation method which can
segment strings of characters even when the characters are defective
as described above, and when the exact period of the characters is
not known.

Segmentation Algorithm - The following image will be used in the
description of the segmentation algorithm.
  Note that this image has the following defects:
  o   Character '3' is split
  o   Characters '4' and '5' are connected

      Note: The following description applies to characters that are
center aligned.  With appropriate modifications, the algorithm will
also work with characters that are left edge aligned or right edge
aligned.

Step 1 - Create a Horizontal Profile for the Image

      The first step of the algorithm is to create a horizontal
profile by 'or'ing the bits in each column of the image.  Doing this
for the example image gives the following image profile.

Note that the image profile consists of five groups of '@'s, these
will be called profile segments.

Step 2 - Determine The Possible Character Centers For Each Possible
Period

      In Step 2, the possible character centers for each allowable
period are determined.  A possible character center position (PCP)
for a given period is defined as follows:
  o  If the width of a profile segment (or combination of segments)
is
     less than, say 5/4 period, then the PCP is equal to the midpoint
     of the segment (or combination of segments).

          The following shows the PCPs (marked by '*'s and labeled
    A-E), that would be obtained for period 16 by this part Step 2.
    Note that PCPs A,C,D and E are the midpoints of individual image
    profile segments, while PCP B is the midpoint of the combination
    of the first two segments.  Also note that no PCPs are obtained
    from the last image profile segment, since it is greater than 5/4
    period wide.
  PCP List for Period 16, Segments (or Combinations) < 5/4 Period
    A   B  C            D             E
  --*---*--*----...