Browse Prior Art Database

User Interface for Verifying an Optical Character Reader Text for Document Database Entry

IP.com Disclosure Number: IPCOM000115133D
Original Publication Date: 1995-Mar-01
Included in the Prior Art Database: 2005-Mar-30
Document File: 2 page(s) / 67K

Publishing Venue

IBM

Related People

Toyokawa, K: AUTHOR

Abstract

Disclosed is a user interface method for verifying and correcting text data which are read from a document page by an Optical Character Reader (OCR) and contain some erroneous portions in general. The essential point of the method is to display a list of keywords and parts of image data where the keywords locate rather than to display a whole read text and a page image. When a recognition of a keyword is not certain, the read keyword is displayed with a highlight mode. By comparing each read keyword and each part of image, an operator can verify and correct keywords easily and quickly. Each keyword has pointers to point positions where the keyword locates in the read text, the correction is reflected in the text at the same time.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 68% of the total text.

User Interface for Verifying an Optical Character Reader Text for
Document Database Entry

      Disclosed is a user interface method for verifying and
correcting text data which are read from a document page by an
Optical Character Reader (OCR) and contain some erroneous portions in
general.  The essential point of the method is to display a list of
keywords and parts of image data where the keywords locate rather
than to display a whole read text and a page image.  When a
recognition of a keyword is not certain, the read keyword is
displayed with a highlight mode.  By comparing each read keyword and
each part of image, an operator can verify and correct keywords
easily and quickly.  Each keyword has pointers to point positions
where the keyword locates in the read text, the correction is
reflected in the text at the same time.  The corrected text is enough
good to be used for document filing and retrieval system where a
content is searched by a set of keywords.

      The Figure shows an example of the front of screen layout.  The
box called "Keyword List" shows a list of keywords read from a
document page, right column, and parts of image extracted from the
document image for comparison, left column.  Boxes behind the Keyword
List box are a document image box titled by "Page Image" and a read
text box titled by "Reco Results", respectively.  A user can verify
and correct each keyword by comparing each keyword image in the
Keyword List box with a keyboard,...