Browse Prior Art Database

ERROR PROTECTION IN DOCUMENT PROCESSORS WITH OCR

IP.com Disclosure Number: IPCOM000024361D
Original Publication Date: 1980-Jun-30
Included in the Prior Art Database: 2004-Apr-02
Document File: 2 page(s) / 55K

Publishing Venue

Xerox Disclosure Journal

Abstract

Character and symbol encoding techniques are valuable in document processing systems for two reasons: (1) OCR makes available to standard typewriters the processing power otherwise accessible only through magnetic media (tape cassettes or cards). (2) The encoding techniques can represent the document image data very efficiently. For example, a 4000-character page may be represented by 20M bits of raw imaginal data, 1M bit of compressed data or as little as 32K bits of character codes plus some format overhead.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 98% of the total text.

Page 1 of 2

XEROX DISCLOSURE JOURNAL

ERROR PROTECTION IN DOCUMENT PROCESSORS WITH OCR
Bruno Vieri
Henry J. Liao

Proposed Classification
U.S. C1. 340/146.3 Int. C1. G06k 9/00

Character and symbol encoding techniques are valuable in document processing systems for two reasons: (1) OCR makes available to standard typewriters the processing power otherwise accessible only through magnetic media (tape cassettes or cards). (2) The encoding techniques can represent the document image data very efficiently. For example, a 4000-character page may be represented by 20M bits of raw imaginal data, 1M bit of compressed data or as little as 32K bits of character codes plus some format overhead.

A critical disadvantage of these encoding techniques is the finite probability of the substitution of one character for another, which is particularly objectionable in numerical matter.

In OCR related systems, unclassif iable symbols are encoded as compressed imaginal data. It is proposed hereby that numbers should also be represented as compressed imaginal data. The remaining failure modes are (a) substitution of alphabetic characters for numbers, which will usually be obvious especially in financial data and (b) substitution by the human recipient, which is irreducible within the image quality capabilities of the system.

Volume 5 Number 3 May/June 1980 263

[This page contains 1 picture or other non-text object]

Page 2 of 2

2 64

 XEROX DISCLOSURE JOURNAL Volume 5 Number 3 May/June 1980

[This page c...