Browse Prior Art Database

Optimal template allocation for a special purpose OCR Disclosure Number: IPCOM000200062D
Publication Date: 2010-Sep-27
Document File: 4 page(s) / 815K

Publishing Venue

The Prior Art Database


Disclosed is a system that contains intelligent forms processing using OCR sub-system.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 37% of the total text.

Page 01 of 4

Optimal template allocation for a special purpose OCR

1. Background

In the business area of the Intelligent forms processing (IFP), so called "Keyword spotting " is put to practical use. This is a kind of data entry system which uses OCR sub-system.

This system recognizes character strings printed on the application form, and identify the form format automatically by comparing OCR results with pre-defined keywords.

For example, in a financial market, IFP reads application forms of the public money processing (tax bill, electric bill ,gas bill and so on), and discriminates them automatically by recognizing keywords which includes local government names, and/or electric/gas company names.

The transaction volume of IFP application is a couple of ten thousands forms to hundreds of thousands forms per a day. Therefore high-speed processing technology is strongly required.

2. Summary of Invention

The bottle neck of the OCR processing speed is calculations of the distances between input patterns and template data set. Japanese OCR systems are required to recognize 2,965 categories of Kanji data set (it is called as JIS(Japanese Industrial Standard) level 1 Kanji) in the minimal case. Recently capability of additional 3,384 Kanji characters (JIS level 2 Kanji) are commonly implemented to the commercial OCR systems.

However, "Key word spotting" process does not need such a large number of category recognition, because they are only required to recognize several hundred of key words in the maximum case.

By reducing template data set, OCR subsystem can be achieved to fasten processing speed. Required (reduced) template data set is dynamically extracted from key word strings.

There are many forms of that have the writing from top to bottom style (vertical writing) in a Japanese document. Key words are also printed in the same way.

There are, of course, some approaches that recognize vertical writing directly, however horizontal writing style (European writing style) is more convenient for image processing (i.e. OCR system).

Therefore most image processing systems are implemented to convert images of the vertical writing to the one of the horisontal writing style by rotating the input image by 90 degrees.

In this case it is necessary to put back the segmented image data by 270 degrees before extracting feature vectors.

However, CPU load is high in the processing of image rotation. Therefore, for speed up, a new OCR system that changes the elements of the feature vector instead of rotates the image by 270 degrees is also established. This method the system uses is effective for the OCR sub-system that has point-symmetric features vectors.

3. Description

3.1 Improvement of the recognition speed by using dynamic template reduction
3.1.1 Implementation

As an example of the key word spotting processing, there is a data entry work of the salary payment report, which are executed by the local government office to calculate local government tax.