Browse Prior Art Database

Efficient Method for Running a Constrained Run Length Algorithm in Vertical and Horizontal Direction on Binary Image Data

IP.com Disclosure Number: IPCOM000050544D
Original Publication Date: 1982-Nov-01
Included in the Prior Art Database: 2005-Feb-10
Document File: 3 page(s) / 67K

Publishing Venue

IBM

Related People

Wahl, FM: AUTHOR [+2]

Abstract

Subdividing digitized documents into meaningful regions is a first necessary processing step in Document Analysis Systems. Recently, a block segmentation scheme has been proposed which provides such a segmentation by means of a constrained run length algorithm (CRLA). The principle of the CRLA can be described as follows: Given an arbitrary binary sequence x of '0's and '1's, then an arbitrary sequence x may be mapped into a binary sequence y, so that the number of adjacent '0's in y equals or exceeds a predefined lower limit L. That is, short sequences (with respect to L) of '0's in x are changed into '1's in y. The segmentation of documents L usually should be a large number, e.g., 20%, of the number of pixels (picture elements) in a line.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 51% of the total text.

Page 1 of 3

Efficient Method for Running a Constrained Run Length Algorithm in Vertical and Horizontal Direction on Binary Image Data

Subdividing digitized documents into meaningful regions is a first necessary processing step in Document Analysis Systems. Recently, a block segmentation scheme has been proposed which provides such a segmentation by means of a constrained run length algorithm (CRLA). The principle of the CRLA can be described as follows: Given an arbitrary binary sequence x of '0's and '1's, then an arbitrary sequence x may be mapped into a binary sequence y, so that the number of adjacent '0's in y equals or exceeds a predefined lower limit L. That is, short sequences (with respect to L) of '0's in x are changed into '1's in y. The segmentation of documents L usually should be a large number, e.g., 20%, of the number of pixels (picture elements) in a line. In order to detect long white horizontal and vertical runs, this algorithm first is applied line by line and then column by column to the binary image data. The resulting bit maps are subsequently combined by a logical 'AND' operation to give the final segmentation result. Using a sequential processor, this algorithm easily can be applied line by line and even easily column by column if it is assumed that enough working storage for the binary input and intermediate bit maps are available. Because a typical digitized and grey level thresholded document consists of about 5,000,000 binary valued pixels, this storage requirement hardly can be fulfilled in most computers. This invention describes an efficient algorithm to calculate the CRLA in horizontal as well as in vertical direction on a line by line processing basis, avoiding the need to store the whole image in memory.

The proposed segmentation can be split into two passes. In the first pass the binary image data is read from disk into working storage, processed by the vertical CRLA pass 1 (VERRLl) and the result is written back onto disk as an intermediate bit map file, line by line, from top to bottom. In the second pass the input bit map is read, the horizontal CRLA (HORRL) is calculated, the intermediate bit map is read, the vertical CRLA pass 2 (VERRL2) is calculated, the two resulting binary sequences are combined logically by an 'AND' operation and then the intermediate bit map on disk is overwritten by the resulting bit sequences. The second pass is calculated line by line, from bottom to top. HORRL, VERRL1 and VERRL2 are calculated as follows: HORIZONTAL CRLA: The calculation of the horizontal CRLA can be performed by using a single counter C, which counts adjacent '0's within a binary image line, e.g., from left to right, as it is in Fig. 1. As can be seen, in a first step, the counter is reset to '0' by every '1' of the input sequence. While counting, the content of C is compared with Input Sequence X : 0 0 1 0 0 0 0 0 1 1 0 1 1 0 0 0 0 PROCESSING FROM LEFT TO RIGHT State of Counter C : 1 2 0 1 2 3 4 5 0 0 1 0 0...