Browse Prior Art Database

Combining Multiple Acoustic Models to Retrieve Data

IP.com Disclosure Number: IPCOM000116802D
Original Publication Date: 1995-Nov-01
Included in the Prior Art Database: 2005-Mar-31
Document File: 2 page(s) / 63K

Publishing Venue

IBM

Related People

Cohen, PS: AUTHOR [+2]

Abstract

Disclosed is a speech recognition system, operating on a computer system, to combine statistics from a spoken name and from the same name spelled out, for selecting a name from a directory or list. In this way a composite statistical model is created to be far more likely to choose the correct name, or to yield a shorter list of candidates including the correct name, than an interpretation of the spoken name or the spelled name alone. This technique works because the typical errors made during the recognition of spelled names, such as confusion among the characters "b," "v," "d," and "e" do not typically occur as errors in the interpretation of corresponding spoken names.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Combining Multiple Acoustic Models to Retrieve Data

      Disclosed is a speech recognition system, operating on a
computer system, to combine statistics from a spoken name and from
the same name spelled out, for selecting a name from a directory or
list.  In this way a composite statistical model is created to be far
more likely to choose the correct name, or to yield a shorter list of
candidates including the correct name, than an interpretation of the
spoken name or the spelled name alone.  This technique works because
the typical errors made during the recognition of spelled names, such
as confusion among the characters "b," "v," "d," and "e" do not
typically occur as errors in the interpretation of corresponding
spoken names.

      This method is used, for example, in a telephone directory
application when a caller is asked to spell the last name after an
attempted selection based on the spoken name has yielded a list of
possible candidates rather than a single name.  Often, both the
spoken name and the spelled name yield different lists of possible
candidates, with the correct candidate being the only name appearing
on both lists.  In the interpretation of such data, tri-gram models
are used by the system, as spoken syllables determine the relative
probabilities of following syllables from a statistical model based
on the particular directory being searched.  Furthermore, while
spelling works poorly for determining "out of vocabulary" words
through speech recognition, it works relatively well in situations
having known n- grams, such as a list of names.

      When a single choice still cannot be made, the caller is asked
to give only the first letter of the first name...