Browse Prior Art Database

Line/Symbol Separation for Raster Image Processing

IP.com Disclosure Number: IPCOM000079215D
Original Publication Date: 1973-May-01
Included in the Prior Art Database: 2005-Feb-26
Document File: 6 page(s) / 74K

Publishing Venue

IBM

Related People

Nolan, BE: AUTHOR

Abstract

The ability to distinguish between line and character data is a prerequisite to the automatic digitization of scanned data. If further image analysis, such as character recognition, is to be performed the data for each character must be isolated. No extraneous, or misleading information should be processed during character recognition. The algorithm to be described satisfies these requirements. However, several basic assumptions are made about the data to be processed.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 41% of the total text.

Page 1 of 6

Line/Symbol Separation for Raster Image Processing

The ability to distinguish between line and character data is a prerequisite to the automatic digitization of scanned data. If further image analysis, such as character recognition, is to be performed the data for each character must be isolated. No extraneous, or misleading information should be processed during character recognition. The algorithm to be described satisfies these requirements. However, several basic assumptions are made about the data to be processed.

. The first assumption is that line and character data do not intersect. If a character is touched by a line at any point, it will be processed as a part of the line.

. A second assumption is that all characters can be contained within some predefined window. No character should cross the window's perimeter, while all lines should exceed its boundaries at some point. If some lines fail to extend beyond the window additional, tests may have to be incorporated into the algorithm.

. Finally, it is assumed that image information is stored in a "flagged run- length coded format". Flagged run-length coded format provides the same information as raster-spot format. However, instead of storing information for each spot in a scan line, only the coordinates of transition points in a scan line are stored, along with a flag indicating whether the transition is from black to white or vice versa.

A typical line/symbol separation situation is illustrated in Fig. 1. A portion of the flagged run-length coded representation of Fig. 1 is shown in Fig. 2. Odd columns of the array are occupied by flags, even columns by X-coordinates. Each row represents a scan line of the image. It has been demonstrated that flagged run-length coded format provides significant storage savings, while increasing processing efficiency through the use of flags. To further illustrate the algorithm, an outline of the line/symbol separation procedure will be given.

To initiate processing, an unprocessed run must be located in the flagged run-length coded data file. After an unprocessed run has been found, processing is continued to determine whether the line segments connected to this run comprise either line data or character data. This decision will be made on the basis of size. Since each symbol is assumed to lie within some predefined window, and each line is assumed to exceed the window's boundaries at some point, line/symbol classification becomes a matter of boundary checks. Once an image extends beyond the predefined window, it is classified as a line. If the image is contained within the window, it is classified as a character (symbol). Some examples of typical classifications are depicted in Fig. 3. An "area of interest" is established based on the dimensions of the window and the midpoint of the initial unprocessed run. An area of interest is illustrated in Fig. 4.

A "run" is defined to be a line segment in a scan line whose start point is defined b...