Browse Prior Art Database

Procedures for Consistent Modelling of Articulatory Cues in Continuous Speech Recognition

IP.com Disclosure Number: IPCOM000100292D
Original Publication Date: 1990-Mar-01
Included in the Prior Art Database: 2005-Mar-15
Document File: 2 page(s) / 93K

Publishing Venue

IBM

Related People

Bahl, LR: AUTHOR [+6]

Abstract

Accurate continuous speech recognition requires that the effects of coarticulation be modelled. In one prior approach, each phone is modelled in a context-dependent fashion by one of several Markov models. The phonological rules which determine the appropriate model for P from the context of P, are determined automatically from some training utterances. If these automatically determined phonological rules are to be useful, the phone contexts must be determined by similar criteria during recognition and rule-construction. Now, many words can be pronounced in several different ways (called lexemes), and so the phone context of a given phone P cannot be determined uniquely from knowledge of the word sequence in the neighborhood of P.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Procedures for Consistent Modelling of Articulatory Cues in Continuous Speech Recognition

       Accurate continuous speech recognition requires that the
effects of coarticulation be modelled.  In one prior approach, each
phone is modelled in a context-dependent fashion by one of several
Markov models.  The phonological rules which determine the
appropriate model for P from the context of P, are determined
automatically from some training utterances.  If these automatically
determined phonological rules are to be useful, the phone contexts
must be determined by similar criteria during recognition and
rule-construction.  Now, many words can be pronounced in several
different ways (called lexemes), and so the phone context of a given
phone P cannot be determined uniquely from knowledge of the word
sequence in the neighborhood of P.  For this reason, it is convenient
to work in units of lexemes instead of words when recognizing
continuous speech. During training, however, it is convenient for
speakers if their training scripts are written in terms of words
rather than lexemes.  This conflict can be resolved by providing a
word-based training script, and determining after-the-fact which
lexemes the speaker uttered where a choice was possible.

      Furthermore, during training and recognition the speaker is
free to pause between words wherever he or she wishes.  Pauses are
important cues and must be reflected in the phone context along with
the neighboring phones.  Thus, the presence of pauses must be
detected during both training and decoding, and the criteria for
pause-detection must be consistent. This invention details training
and recognition procedures which satisfy the above requirements.

      During training the following steps are performed.
      Step 1.  Using phonological rules which do not extend beyond
word boundaries, or possibly no rules at all, construct a Markov
model for each lexeme in the vocabulary.
Step 2.  For each word W in the vocabulary, create a Markov word
model for W from the lexemes of W by linking together all the lexeme
models in parallel.
      Step 3.  Append to each word model a deleteable Markov model to
represent silence (a pause).
      Step 4.  Obtain trained statistics for the c...