Browse Prior Art Database

System and Method for Automatic Pattern Recognition of Receipt Objects Within Images Disclosure Number: IPCOM000199579D
Publication Date: 2010-Sep-09
Document File: 3 page(s) / 42K

Publishing Venue

The Prior Art Database


Inherent to any expense reimbursement system is the requirement for auditing expenses submitted by users for accuracy and correctness prior to payment. Currently, this is a process performed by visual examination of receipt documents (either by viewing originals, or scanned copies) by dedicated human auditors. This process is labor intensive, such that the total number of expenses audited is often far outpaced by the incoming volume of submitted expenses. Automating such a process would result in significant cost savings, as well as enabling a larger percentage of audits to occur overall. One of the most challenging aspects of automating these processes is the task of normalizing (or cleaning) input data so that it is in a form conducive for extracting information. One such data cleaning task is necessitated by the fact that many organizations allow employees to include multiple receipt documents per scanned or faxed page, to maximize ease of use. The results in the organization receiving images containing multiple receipt objects in various orientations on the same page. This can create ambiguity when attempting to extract data from the image, since 1) most optical character recognition (OCR) engines assume a single text orientation when processing images (this results in the output containing gibberish for text oriented the "wrong" way in the source image), and 2) having an unknown number of receipts per page make effective use of rules to parse the page data for extracting information almost impossible. Thus, it is necessary to determine the boundaries of the individual receipts present in the images, such that they can each be processed separately. Although there are various document segmentation techniques in the current state of the art, there are features in these images that are unique and make these existing algorithms unsuitable.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 43% of the total text.

Page 1 of 3

System and Method for Automatic Pattern Recognition of Receipt Objects Within Images

The basic approach is to perform a number of Gaussian filters (specifically Gabor and LogGabor) to reveal connections between adjacent lines based on the assumption that the space between lines in a receipt is smaller than the spaces between receipts. We first apply a number of small scale Gabor filters oriented at different angles to determine the major direction of the page's text layout. Next we apply a large-scale LogGabor filter oriented in the direction of the major text direction to reveal line connectivity in the receipts. We make the assumption that the constituent parts of a receipt are more connected to each other than they are to parts of other receipts. The receipt document segmentation approach then, is to make all parts of one receipt connected as much as possible, and then use spaces between different receipts to split them.

Core Process:

The core process is shown in Chart 1. Note that there are some alternatives in implementation which wouldn't affect the overall approach. These will be called out below.

Chart 1: Flow Chart of Core Process

A) Input Image

The core process as shown takes as input a single page image. Another option, not shown is

inputting a multi-page image, in which case the algorithm would be run repeatedly on each page. Another optional step is to run image enhancement and/or image scaling algorithms to improve the input images quality (in fact, our reference implementation includes these steps).


[This page contains 1 picture or other non-text object]

Page 2 of 3

B) Direction Detection using directional Gabor filters

As receipts can be placed on the page in any direction, we first need to determine the page's

orientation. In the core process we look at two directions; horizontal and vertical. Another option is to perform the same steps with additional directions (i.e. to detect if the direction is at an angle).

To determine the direction we apply a Gabor filter at least twice (the Gabor filter is a good choice as a Gaussian filter with multi-directions and multi-scales - other Gaussian based filters could potentially also be used). First we apply a small scale 0 degree 2-directional Gabor filter on the source image resulting in a magnitude response. Next we apply a small scale 90 degree 2-directional Gabor filter on the source image resulting in a magnitude response.

     We now use the frequency of the matched response to calculate the standard deviation for each of the resulting images. The "brighter" the magnitude response, the higher the standard deviation will be, and the higher the match is. If the 0 degree response's standard deviation is higher, then the major direction of the text lines in the image is horizontal. If the 90 degree response's standard deviation is higher, then the major direction of the "text" lines in the image is vertical. The response is directly related to the amount of text occurring in...