Browse Prior Art Database

Enhanced Methods for Spelling Names in Speech Recognition Systems

IP.com Disclosure Number: IPCOM000116769D
Original Publication Date: 1995-Nov-01
Included in the Prior Art Database: 2005-Mar-31
Document File: 2 page(s) / 104K

Publishing Venue

IBM

Related People

Sherwin Jr, EB: AUTHOR

Abstract

Disclosed are enhanced methods for providing a proper name as an input to a speech recognition system unable to recognize names reliably using a basic method, such as requiring the speaker, in the manner of a spelling bee, to first pronounce the name being sought and then to give the spelling. With this basic method, statistical probabilities associated with both of these utterances are combined to identify the highest-probability candidate. A first list is generated of candi dates chosen from the spoken name, and a second list is generated of candidates chosen from the spelled name.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Enhanced Methods for Spelling Names in Speech Recognition Systems

      Disclosed are enhanced methods for providing a proper name as
an input to a speech recognition system unable to recognize names
reliably using a basic method, such as requiring the speaker, in the
manner of a spelling bee, to first pronounce the name being sought
and then to give the spelling.  With this basic method, statistical
probabilities associated with both of these utterances are combined
to identify the highest-probability candidate.  A first list is
generated of candi dates chosen from the spoken name, and a second
list is generated of candidates chosen from the spelled name.

      These enhanced methods consist of various additional steps to
be taken by the speech recognition system.  For example, the user is
asked to say the name a second time, allowing the system to combine
three utterances, selecting the highest probability candidate from
the commingled statistics.  In another example, the user is asked how
many letters are in the name.  The answer to this question eliminates
80 to 90 percent of the names in a typical list.  While it may seem
unlikely that a system incapable of recognizing a name when it is
spoken and spelled can be helped in this way, in reality, providing
the number of letters typically eliminates many entries in both the
first and second lists described above, often making the choice of
the correct candidate much easier.

      With some of the enhanced methods, statistical inferences are
made from the general population represented by the directory.  The
probabilities of certain first names are based on their occurrence in
the general population of the directory.  For example, Johan is
picked over John only when Johan is clearly heard, with the
statistics being biased in favor of John.  Choices for first names
are weighed according to the probabilities of certain first and last
names occurring together in bi-grams.  For example, Abraham is an
unlikely first name to be used with Woo or Yamamato.  This bi-gram
approach can be developed using Compact Disks of white page
directories.

      Statistics from different utterances are considered as having
different weights.  For example, in evaluating the results of the
basic method, the spoken name is weighted at 40 percent, while the
spelled name is weighted at 60 percent.  If a person is asked to
repeat the pronunciation of a name, he generally enunciates more
clearly, speaking more loudly, so the second effort is given a
greater weight than the first.  The statistics of different
utterances generate an exception condition when they are too
different.  If the second utterance is recognized as a completely
different word, or if the highest probability candidate on one list
has a low probability on the other, it is determined that one of the
utterances was garbled.

      With ot...