Browse Prior Art Database

Decoding of a Consistent Message using Both Speech and Handwriting Recognition

IP.com Disclosure Number: IPCOM000103716D
Original Publication Date: 1993-Jan-01
Included in the Prior Art Database: 2005-Mar-18
Document File: 4 page(s) / 208K

Publishing Venue

IBM

Related People

Bellegarda, JR: AUTHOR [+2]

Abstract

Algorithms are developed for the decoding of a consistent message using both speech and handwriting recognition. For each word the total likelihood score is assumed to be the weighted sum of the two likelihood scores resulting from the separate evaluation of the spoken and handwritten evidence. Emphasis is placed on the estimation of the weighting factors used in forming this total likelihood.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 30% of the total text.

Decoding of a Consistent Message using Both Speech and Handwriting Recognition

       Algorithms are developed for the decoding of a consistent
message using both speech and handwriting recognition.  For each word
the total likelihood score is assumed to be the weighted sum of the
two likelihood scores resulting from the separate evaluation of the
spoken and handwritten evidence.  Emphasis is placed on the
estimation of the weighting factors used in forming this total
likelihood.

      Let A(W;AC) be some (e.g., likelihood) score attached to a word
W from some vocabulary when AC is the observed acoustic sequence.
Similarly, let H(W;HW) be a score attached to a string of characters
W when HW is the observed chirographic sequence (i.e., the sequence
of strokes containing the handwriting information).  The standard
decoding procedure for speech (respectively, handwriting) recognition
is to select from a list of candidate words the Word W which
maximizes the score A(*;AC) (respectively, H(*;HW)).  To consider a
combined use of speech and handwriting recognition, one way is to
maximize the overall score c (W;AC;HW) obtained as a weighted sum
containing both scores A(*;AC) and H(*;HW) simultaneously:

                            (Image Omitted)

                                                         (1)

      Clearly, if each of the scores A(*;AC) and H(*;HW) attains its
maximum on the same word, this word is the best estimate for the
piece of message jointly conveyed by AC and HW.  The problem becomes
more difficult when the two scores attain their maximum on different
words in the vocabulary, because the final decision will depend on
the weighting factors a and b.  The algorithms described therein
provide for the automatic determination of the weighting factors a
and b.

      Inherently, these weighting factors should vary in accordance
with the degree of confidence that can be placed at any given time in
the speech or handwriting recognition process, respectively.  This
implies that they should reflect the intrinsic quality of the user's
speech and handwriting (determined from user-dependent
characteristics, such as age, sex, sore throat, aching hands,
tiredness, accent, writing habits, etc.), the environment under which
both acoustic and chirographic evidence were produced (including the
effects of ambient acoustic or vibration noise and the quality of the
recording equipment), as well as the relative difficulty of the task
at hand (evidenced by the distribution of likelihood scores over the
set of candidate words).  This leads to two main classes of
algorithms, depending on whether the coefficients a and b are
estimated primarily on the basis of the current acoustic and
chirographic data, or on the basis of past and present environmental
conditions.

      Throughout, the speech and handwriting recognition processes
are assumed t...