Browse Prior Art Database

Making Use of Right Context in a Speech Recognizer

IP.com Disclosure Number: IPCOM000106078D
Original Publication Date: 1993-Sep-01
Included in the Prior Art Database: 2005-Mar-20
Document File: 2 page(s) / 84K

Publishing Venue

IBM

Related People

Bahl, L: AUTHOR [+6]

Abstract

It is well-known that the pronunciation of a word depends on the context in which it occurs. In particular, the pronunciation of a word depends both on the words that precede and follow it. To model this co-articulation, context-dependent acoustic models are employed [1,3,4] Such models require knowledge of the current word, the preceding words, and the following words.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Making Use of Right Context in a Speech Recognizer

      It is well-known that the pronunciation of a word depends on
the context in which it occurs.  In particular, the pronunciation of
a word depends both on the words that precede and follow it.  To
model this co-articulation, context-dependent acoustic models are
employed [1,3,4]  Such models require knowledge of the current word,
the preceding words, and the following words.

      If one has a general model for context dependence in which the
acoustic models can actually depend on several words in the past and
the future, one is faced with a very difficult recognition problem.
Systems that rely on Viterbi decoding [3,4,5] simplify the problem by
limiting the context dependence of the acoustic models to one phone
on either side of the current phone.  Another solution would be to
generate the N-best sentence hypotheses.  Neither of these solutions
are satisfactory because the initial employment of substantially
weakened models will introduce recognition errors.

      In a speech recognition system that employs as stack decoder,
such as the IBM Speech Recognition System[2], recognition is
performed in a left-to-right fashion as a function in time.
Therefore, when recognizing a particular word in a sentence, one
always knows all the preceding words, and it is therefore a
straightforward procedure to employ models whose parameters depend
both on the current word and all the words that precede it.  However,
the words to the right of the current word are clearly unknown, thus
preventing one from obviously making use of word context to the right
of the current word.  This disclosure described a procedure for
incorporating models that depend on right-context in a large
vocabulary speech recognition system that employs a stack decoder.  A
detailed description of the invention follows.

      In a stack decoder, the active paths, consisting of strings of
words, are kept in a stack ordered by path likelihood.  The decoder
extends those paths on the stack with the highest likelihood values.
To take advantage of right context beyond the word boundary, when a
path is extended, the following procedure is followed:

1.  Decide which path is to be extended.

2. ...