Browse Prior Art Database

Automatic Handwriting Recognition Enhancement Based on Syntactic and Semantic Cues

IP.com Disclosure Number: IPCOM000105167D
Original Publication Date: 1993-Jun-01
Included in the Prior Art Database: 2005-Mar-19
Document File: 4 page(s) / 142K

Publishing Venue

IBM

Related People

Bellegarda, JR: AUTHOR [+4]

Abstract

A computational technique is proposed for identifying and correcting errors made by an automatic handwriting recognizer (AHR). It is based on the use of a syntactic parser and semantics cues for words found in on-line dictionaries and corpora. This method is shown to improve the quality of the output of the IBM handwriting recognition system.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 37% of the total text.

Automatic Handwriting Recognition Enhancement Based on Syntactic and Semantic Cues

      A computational technique is proposed for identifying and
correcting errors made by an automatic handwriting recognizer (AHR).
It is based on the use of a syntactic parser and semantics cues for
words found in on-line dictionaries and corpora.  This method is
shown to improve the quality of the output of the IBM handwriting
recognition system.

      1.  Background - Automatic recognition systems aiming to
recognize natural speech (cf.  [1 ]) or handwriting (cf.  [2 ])
frequently make decoding errors which result in syntactically and
semantically incorrect sentences.  In addition, in the case of
automatic handwriting recognition, lexical errors are common since
recognition is performed at the character level.  for Example, a
typical output of the current IBM handwriting recognizer [2]  might
read:

  I do no+wan+my office to be a bo++leneck in an7 aP9roual process.

As was readily recognized early on, such outputs can be partially
corrected by substituting characters for commonly confused letters.
This typically eliminates lexical errors, as in:

  I do not want my office to be a bottleneck in (and any) approval
                              process.
where (and any) represents two candidates for an7.  Clearly, only one
of these candidates produces a sentence which has a valid syntactic
structure in English (any).  Syntatic analysis on this sentence would
thus produce a correct sentence.

      Often, however, several candidates have the correct part of
speech, in which case syntactic analysis is not sufficient to
disambiguate the sentence.  Consider, for example, the following
sentence where three candidates were produced for the recognized
string haue:

  Such a reading room would also (have hate haze) an outside view.

      The only way to produce a correct sentence in this case is to
determine that hate and haze are not appropriate semantically because
the subject of the verb under consideration (room) has no HUMAN
senses and both of these verbs prefer HUMAN subjects (as in: His
fraternity brothers hazed him regularly.  and: Bush hates broccoli.).
This requires semantic analysis of the sentence.

      2.  Recognition Enhancement Procedure - Given a sentence
generated by a handwriting recognizer, such as the recognizer
described in [2] for run-on handwritten characters, processing
comprise the following steps.

1.  Non-word tokens are replaced by candidate lists using traditional
    correction methods.  Two alternatives are (i) substituting
    commonly confused characters based on confusability matrix
    accumulated on some representative training text, and (ii) using
    a standard spelling correction algorithm, such as PROOF.  A
    combination of the two can also be considered.
2.  Candidate sentences made up of combinations of candidate words
    are processed by...