Browse Prior Art Database

Method for Character String Detection

IP.com Disclosure Number: IPCOM000101833D
Original Publication Date: 1990-Sep-01
Included in the Prior Art Database: 2005-Mar-16
Document File: 2 page(s) / 88K

Publishing Venue

IBM

Related People

Amano, T: AUTHOR [+2]

Abstract

This article discloses a method for detecting and concatenating primitive rectangles that represent pieces of character strings. This method allows efficient implementation of the character string extraction algorithm described in Japanese published unexamined patent application 01-253077.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Method for Character String Detection

       This article discloses a method for detecting and
concatenating primitive rectangles  that represent pieces of
character strings.  This method allows efficient implementation of
the character string extraction algorithm described in Japanese
published unexamined patent application 01-253077.

      Two kinds of tables are used in this method. As a raster-scan
process that detects the top and bottom boundaries of character
strings proceeds, the top boundaries are recorded in one table, and
primitive rectangles lying between the top and bottom boundaries are
recorded in another.  However, once a top boundary (a piece of a top
boundary) is found to correspond to a bottom boundary (resolved) and
then recorded as a primitive rectangle, it becomes unnecessary for
later processing.  Primitive rectangles that are concatenated and
stored in a result buffer also become unnecessary. Thereby, if the
resolved top boundaries and rectangles whose x-coordinates overlap
with a recently detected rectangle are eliminated from each table,
the following properties come into existence: (1) No element in a
table overlaps with another, and (2) The necessity of recording and
eliminating boundaries and rectangles arises from left to right
during one line scan. This invention utilizes these properties.

      The figure shows an example of this method. Let a table for
recording unresolved top boundaries be called a URT table, and let a
table for recording  recently detected primitive rectangles be called
a MRT table. Each table actually consists of two arrays (source and
destination) and indices for each array.  The recording and
elimination of data are performed while elements of the arrays are
being copied from source to destination. Assume that when the
boundary detection process starts along the current scan line (the
dotted line...