Browse Prior Art Database

Method for Formatting Optical Character Reader Texts

IP.com Disclosure Number: IPCOM000114630D
Original Publication Date: 1995-Jan-01
Included in the Prior Art Database: 2005-Mar-29
Document File: 4 page(s) / 127K

Publishing Venue

IBM

Related People

Hirayama, Y: AUTHOR

Abstract

Disclosed is a method for formatting characters recognized by an Optical Character Reader (OCR) system in a layout similar to that of the original image.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Method for Formatting Optical Character Reader Texts

      Disclosed is a method for formatting characters recognized by
an Optical Character Reader (OCR) system in a layout similar to that
of the original image.

      Fig. 1 shows an image of a sample document.  First, the system
analyzes the layout of the image.  Next, it recognizes characters in
the image.  Then it gets information on character strings and layout,
such as the sizes and positions of blocks in the image that contain
text, figures, and tables.  This information is used in the
subsequent formatting process.

      There are two stages in the formatting process: sorting blocks,
and mapping characters to an output area.

      In the first stage, the system sorts text blocks by using the
following method to determine their mapping order.  If a block is
located above or to the left of another block, the former precedes
the latter.  In this way, the system sorts all the text blocks in the
image and determines their mapping order.  For example, it will order
the sample layout of text blocks shown in Fig. 2-A from 'A' to 'E'.

      In the second stage, the system provides for output a text area
filled with space code, and the system maps recognized characters in
the text blocks to the output area in the order determined in the
previous stage.  This stage has three steps: determining the
positions of text blocks, mapping characters to the output area, and
rearranging the mapped text blocks.

      In the first step, the system determines the position of a text
block by satisfying constraints.  For example, in Fig. 2-A...