Browse Prior Art Database

Information-Type Discriminator

IP.com Disclosure Number: IPCOM000062275D
Original Publication Date: 1986-Nov-01
Included in the Prior Art Database: 2005-Mar-09
Document File: 3 page(s) / 29K

Publishing Venue

IBM

Related People

Fox, SJ: AUTHOR [+2]

Abstract

This article describes a processing technique for categorizing scanned gray scale information from a mixed format document as either line copy (LC) or nonline copy (NLC) information. This information can then be passed to an appropriate discriminator (thresholder) for processing. Scanning divides the document into picture elements (pels) to which a video (brightness) value (V) is assigned. For each pel resulting from the scan, a pair of simulated defocused pels are generated. The defocused pels are, respectively, horizontally and vertically weighted averages of the original pel value with its neighboring pels. A gradient is computed for each pel in the two sets of simulated data. An overall gradient for each pel is formed by taking the difference of the two gradients.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 38% of the total text.

Page 1 of 3

Information-Type Discriminator

This article describes a processing technique for categorizing scanned gray scale information from a mixed format document as either line copy (LC) or nonline copy (NLC) information. This information can then be passed to an appropriate discriminator (thresholder) for processing. Scanning divides the document into picture elements (pels) to which a video (brightness) value (V) is assigned. For each pel resulting from the scan, a pair of simulated defocused pels are generated. The defocused pels are, respectively, horizontally and vertically weighted averages of the original pel value with its neighboring pels. A gradient is computed for each pel in the two sets of simulated data. An overall gradient for each pel is formed by taking the difference of the two gradients. If the overall gradient is greater than a predetermined threshold, the pel is tentatively characterized as line copy (LC). If not, the pel is tentatively characterized as continuous tone (NLC). A pel map of the tentatively classified pels is formed. The map is examined to determine the difference of the pel of interest from its neighboring pels. If the difference is significant, the pel classification (LC or NLC) is changed. This characterization method is hereinafter referred to as information homogeneity. The defocused symmetry technique and the information homogeneity technique are utilized sequentially to classify each pel as either line copy or nonline copy information. The defocused symmetry technique takes advantage of the symmetry difference between the two information types. Halftones, when examined in a local region on the order of the halftone cell size, tend to be roughly symmetric. If this local region is intentionally defocused, the component halftone dots tend to blur into a uniform gray which is nondirectional and of low gradient. Often, the major remaining directionality component is a 45OE halftone screen angle. On the other hand, line copy, particularly at character boundaries, tends to have a specific directionality. Most character strokes are oriented either vertically or horizontally. This directionality is preserved even when defocused, particularly if the defocus is weighted in either the vertical or horizontal direction. A small percentage of pels are, however, incorrectly classified. The error of classification is partially due to the beat frequency between the frequency of the defocus cell and the frequency of the halftone screen. The other categorizing technique, information homogeneity, will often correct for this error. The technique is based on the assumption that areas of the line copy and nonline copy on a document are relatively large compared to pel sizes and that these large areas are homogeneous, i.e., all line copy or all nonline copy. Thus, any given pel has a high probability of being the same information type as its neighboring pels. This homogeneity of information technique examines the pel clas...