Browse Prior Art Database

Text Line Detection Method from Documents Containing White Characters Printed on Black Background

IP.com Disclosure Number: IPCOM000117419D
Original Publication Date: 1996-Feb-01
Included in the Prior Art Database: 2005-Mar-31
Document File: 4 page(s) / 107K

Publishing Venue

IBM

Related People

Amano, T: AUTHOR

Abstract

Disclosed is a method for detecting text lines from document images in which white characters are printed on black background. These reversed character strings sometimes appear in the images of magazines and catalogs. However, most conventional document image analysis methods can not deal with the reversed characters because they are based on the assumption that black characters are printed on white background. By using information of black-to-white and white-to-black transitions on an observed horizontal line, instead of black pixels, the proposed method can deal with both types of character strings in a single document image.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 64% of the total text.

Text Line Detection Method from Documents Containing White Characters
Printed on Black Background

      Disclosed is a method for detecting text lines from document
images in which white characters are printed on black background.
These reversed character strings sometimes appear in the images of
magazines and catalogs.  However, most conventional document image
analysis methods can not deal with the reversed characters because
they are based on the assumption that black characters are printed on
white background.  By using information of black-to-white and
white-to-black transitions on an observed horizontal line, instead of
black pixels, the proposed method can deal with both types of
character strings in a single document image.

      Fig. 1 shows the process flow of text line detection.  The
procedures except "Detect transitions" are the same as the one which
has been disclosed in (*).  Smeared run-length data are generated by
replacing short horizontal white runs with black runs during raster
scanning.  The horizontal top and bottom boundaries of smeared blobs
are detected by comparing the smeared run-length data from vertically
successive line images.  Then, character string areas are detected as
the rectangles sandwiched between the top and bottom boundaries.

      In "Detect transitions", a line image is shifted with 1 bit,
then an exclusive-or image is generated from the shifted line image
and the original one.  In the resultant line image, a v...