Browse Prior Art Database

Block Segmentation Method for Document Images

IP.com Disclosure Number: IPCOM000111282D
Original Publication Date: 1994-Feb-01
Included in the Prior Art Database: 2005-Mar-26
Document File: 6 page(s) / 124K

Publishing Venue

IBM

Related People

Hirayama, Y: AUTHOR

Abstract

Disclosed is a method for block segmentation of document images. The method correctly segments document images into text and Figure areas, and is applied to document images that have complicated column structures. Fig. 1 shows a sample document image.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 56% of the total text.

Block Segmentation Method for Document Images

      Disclosed is a method for block segmentation of document
images.  The method correctly segments document images into text and
Figure areas, and is applied to document images that have complicated
column structures.  Fig. 1 shows a sample document image.

      First, character strings, vertical and horizontal lines, and
other groups of black pixel components are extracted from a page
image by an algorithm for detecting character strings [*]  (Fig. 2).

      In the next step, a height histogram and a distance histogram
of character strings are constructed (Figs. 3A and 3B).  "Distance"
means the distance between the base lines of two adjacent character
strings in a vertical direction.  The distance histogram is called a
global histogram.

      Since character strings in text areas are arranged regularly,
they can be merged into groups by analyzing this regularity.  In each
histogram, distributed elements are classified into several groups
(for example, "A" in Fig. 3A and "B" in Fig. 2B).  Distance
histograms of each group in the height histogram are then made (Fig.
3C).  These distance histograms are called local histograms.  In
them, distributed elements are also classified into several groups
(for example, "C" in Fig. 3C).

      The largest group in each local histogram is then detected, as
well as the corresponding group in the global distance histogram.
Finally, a correspondence between t...