Browse Prior Art Database

Model and Method of Understanding Layout for Processing of Printed Forms

IP.com Disclosure Number: IPCOM000110846D
Original Publication Date: 1994-Jan-01
Included in the Prior Art Database: 2005-Mar-26
Document File: 4 page(s) / 104K

Publishing Venue

IBM

Related People

Yamashita, A: AUTHOR

Abstract

This article describes a layout model and an analysis method for extracting specific blocks from images of printed forms. Target blocks and special ones called 'marker blocks', which determine the basis of the coordinates of a page, are defined in a layout model. The method first finds marker blocks, then extracts target blocks by using the relative coordinates specified by marker blocks. The topological features of target blocks should coincide with the description in the model.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Model and Method of Understanding Layout for Processing of Printed
Forms

      This article describes a layout model and an analysis method
for extracting specific blocks from images of printed forms.  Target
blocks and special ones called 'marker blocks', which determine the
basis of the coordinates of a page, are defined in a layout model.
The method first finds marker blocks, then extracts target blocks by
using the relative coordinates specified by marker blocks.  The
topological features of target blocks should coincide with the
description in the model.

      Fig. 1 shows an example of a layout model for printed forms.
The model is shown as a table.  Since the base structure of the model
is a tree, and each line of the table corresponds to a node of the
tree, the model can also define a tree structure of blocks for use in
analyzing general documents.  In the table, 'Nest' means the level of
the tree.  Since the structure of a model for analyzing forms is
flat, the 'Nest' parameters for target blocks should be '1'.  'Name'
means the block's name.  'Dir' and 'Num' indicate whether character
lines included in the block are arranged horizontally ('Hor') or
vertically ('Ver'), the minimum number of lines and the maximum
number, respectively.  The 'Sep' column means that there are
separators for segmenting neighboring blocks above ('A'), below
('B'), on the left ('L'), and on the right ('R') of the block.  The
parameter 'B' indicates a black-line separator, and 'W' a white-space
separator.  If the target blocks are parts of a table, they are
generally surrounded by black lines, and therefor the definition for
the blocks in the model includes 'Sep = BBBB', which means that there
are black lines above, below, on the left, and on the right of the
blocks.  'Mark' indicates whether a block is defined as a marker
('Yes') or not ('No').  The marker is a basis for detecting other
target blocks defined in the model.  For example, running heads,
logos, marks on OCR forms, and so on might be markers, and the
coordinates of rectangles surrounding those objects have to be
specified in the model.  The 'X,Y,SX,SY' columns show the
x,y-coordinates of a block relative to a the marker.

     ...