Browse Prior Art Database

Method of Deskewing Scanned Business Documents

IP.com Disclosure Number: IPCOM000129210D
Original Publication Date: 2005-Oct-01
Included in the Prior Art Database: 2005-Oct-01
Document File: 1 page(s) / 24K

Publishing Venue

IBM

Abstract

Production jobs are scanned using high volume scanners which feature an automatic document feeder (ADF). This automatic feeding results in the pages being often scanned in slightly crookedly. This is known as "skew." It is possible to automatically detect and correct this skew through a process known as deskewing. This is a common problem to which a number of solutions exist. These solutions are invariably computationally intensive, time-consuming, and/or error prone. A new method of skew detection is proposed. This method takes advantage of characteristics of common business documents to optimize speed and accuracy.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 59% of the total text.

Page 1 of 1

Method of Deskewing Scanned Business Documents

Our customer's business is to scan books for short run print jobs. These books are scanned using high volume scanners which feature an Automatic Document Feeder (ADF). This automatic feeding results in the pages being often scanned in slightly crooked. This is known as "skew." It is possible to automatically detect and correct this skew through a process known as deskewing. This is a common problem to which a number of solutions exist. These solutions are invariably computationally intensive, time-consuming, and/or error prone.

Our customer's scanned data is characterized by a high text to image ratio, meaning that there are many lines of text for each illustration or graphic. Novels, for instance, fall into this category. Using this knowledge of the typical case, our algorithm is optimized to provide an accurate deskew with a minimal computation time. It is, of course, generic enough to handle any text-containing data. Given a well selected search area, the skew can be detected and corrected more accurately and more quickly than other general approaches to the problem.

Normally skew is detected through the following algorithm:
I. Skew amount is detected
a. Lines are found
i. Series of glyphs are detected
ii. Goodness of fit is applied to create a line
iii. Angle of lines is averaged to calculate skew for page
II. Page is rotated to correct skew

However, in this implementation, the accuracy of the algorithm is significantl...