Browse Prior Art Database

Method for Analyzing Document Image Structure

IP.com Disclosure Number: IPCOM000118881D
Original Publication Date: 1997-Aug-01
Included in the Prior Art Database: 2005-Apr-01
Document File: 2 page(s) / 60K

Publishing Venue

IBM

Related People

Amano, T: AUTHOR

Abstract

Disclosed is a method for detecting harmful connections among black pixel components in document image analysis tasks. Some types of smearing processes, which are useful in bottom-up image analysis approach, cause connections between different types of components, such as text and a figure, characters in different columns. These wrong connections have harmful effects on the results of document image analysis. In this invention, an image is smeared. Suppose that an appropriate number of "break points" are arranged along diagonal lines on the image. These break points prevent black pixel components from being all connected, even if a smearing threshold is improperly large, so that a feedback operation can be applied.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 80% of the total text.

Method for Analyzing Document Image Structure

      Disclosed is a method for detecting harmful connections among
black pixel components in document image analysis tasks.  Some types
of smearing processes, which are useful in bottom-up image analysis
approach, cause connections between different types of components,
such as text and a figure, characters in different columns.  These
wrong connections have harmful effects on the results of document
image analysis.  In this invention, an image is smeared.  Suppose
that an appropriate number of "break points" are arranged along
diagonal lines on the image.  These break points prevent black pixel
components from being all connected, even if a smearing threshold is
improperly large, so that a feedback operation can be applied.

      Fig. 1 shows how a break point controls a smearing process.  In
this example, black pixels are horizontally smeared, and two break
points (A and B) are used.  Usually, a white run (sequence of
pixels), which is shorter than a threshold value, is smeared and
replaced with a black run.  But, if a break point (A) is located on
the white run, the run is preserved.  A break point on a black run
(B) has no effect on the smearing result.

      A smearing program processes a document image supposing that
the break points are arranged along diagonal lines, as shown in Fig.

2.  This image has two text columns (left and right).  Use of large
threshold values causes a connection between t...