Browse Prior Art Database

Corrective Training of Feneme-Based Markov Models for Discrimination between Highly Confusable Words

IP.com Disclosure Number: IPCOM000104478D
Original Publication Date: 1993-Apr-01
Included in the Prior Art Database: 2005-Mar-19
Document File: 4 page(s) / 134K

Publishing Venue

IBM

Related People

Bahl, LR: AUTHOR [+6]

Abstract

In the context of a connected utterance speech recognition system, corrective training is applied to estimate the parameter values of multi-arc Markov word models for highly confusable words. This leads to a smaller number of recognition errors.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 41% of the total text.

Corrective Training of Feneme-Based Markov Models for Discrimination between Highly Confusable Words

      In the context of a connected utterance speech recognition
system, corrective training is applied to estimate the parameter
values of multi-arc Markov word models for highly confusable words.
This leads to a smaller number of recognition errors.

      In an automatic speech recognition system such as described in
[1], the pronunciation of each word is represented by a hidden Markov
model.  Such Markov word models (or baseforms) are usually composed
of a small inventory of feneme-based sub-word models [2,3]  giving
them some degree of flexibility and robustness, which makes possible
the recognition of connected utterance speech tasks.  It has remained
difficult, however, to produce highly discriminatory baseforms which
are able to resolve the fine differences between two or more
acoustically confusable words.

      The solution outlined below applies corrective training [4]  to
the so-called "multi-arc" Markov models developed in [3]  to enhance
their power of discrimination.  This is made possible by the fact
that, unlike the baseforms of [2], the multi-arc baseforms of [3]
are obtained through maximum likelihood estimation [5].  Corrective
training refines the maximum likelihood estimates for the parameters
of the multi-arc baseforms, by iteratively adjusting these parameters
so as to make correct words more probable and incorrect words less
probable.  Since any parameter adjustment may introduce new errors,
this approach should be reserved for cases where the probabilities of
the correct and incorrect words are so close that the recognizer is
not able to tell them apart.  This is especially true for
acoustically confusable words such as "a" and "the," whose
recognition is very sensitive to the effects of context-dependence
and co-articulation.  In such situation corrective training will
increase the difference in the correct and incorrect probabilities,
hence improving the robustness of the recognizer.

      Corrective Training Algorithm - For notational convenience,
consider the problem of discriminating between two confusable words:
given some training data Y, the issue is to estimate the parameter
sets S sub 1 and S sub 2 of the multi-arc baseforms B sub 1 and B sub
2 representing, in some given acoustic context C, the two confusable
words of interest.  Assume, arbitrarily, that B sub 1 represents the
correct word and B sub 2 the incorrect one.  The corrective training
algorithm is described below.

0.  Initialization: using maximum likelihood, compute for i = 1,2 the
 approximate frequencies c<hat>above sub i(f,a) for each feneme in
 the baseform B sub i and each arc in the arc inventory.  In other
 words, find the sets of parameters S<hat>above sub i such that:

      S<hat>above sub i = arg <max>above sub Si Pr sub S sub i.  (Y|B
sub i),                  i = 1,2.     ...