Browse Prior Art Database

Smoothing Techniques for Hidden Markov Models Used in Automatic Speech Recognition

IP.com Disclosure Number: IPCOM000100486D
Original Publication Date: 1990-Apr-01
Included in the Prior Art Database: 2005-Mar-15
Document File: 3 page(s) / 122K

Publishing Venue

IBM

Related People

Grice, DG: AUTHOR [+4]

Abstract

Smoothing is a method disclosed here that attempts to give a broader representation of the various sounds comprising a particular state within a Markov model. This is an attempt to reduce the irregularities that arise when trying to recognize spoken words that for different reasons the sounds occurring are similar but slightly different from the sounds that occurred in the training data.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Smoothing Techniques for Hidden Markov Models Used in Automatic Speech Recognition

       Smoothing is a method disclosed here that attempts to
give a broader representation of the various sounds comprising a
particular state within a Markov model.  This is an attempt to reduce
the irregularities that arise when trying to recognize spoken words
that for different reasons the sounds occurring are similar but
slightly different from the sounds that occurred in the training
data.

      To compensate for these cluster references that may not appear
in the spoken word, an algorithm utilizing the K-Nearest neighbor
rule was implemented.  The basic idea is to create a table for each
sound cluster of its N nearest neighbors and the distance of each of
these N "neighbors" for each cluster reference.  By this we
essentially "smooth" the occurrence likelihood matrix Bjk to account
for actual data (sounds) encountered in the training process plus
those sounds that are very close but may not have shown up in the
training data for one reason or another.  Those neighbors that are
closer, obviously, will be given a better weight than the more
distant ones and a distance threshold can be chosen to eliminate all
candidates that are too far off. Recognition is accomplished by
computing the forward probability of the input string of tokens to
each of the models and selecting the word whose model has the best
score.  If the input string of n tokens, 0, is
 0 = 0102  ... 0n
the recursive F-B (forward-backward) algorithm is stated as
                Zt = (Zt-1 x Aij) x Bjk [O[t]].

      Aij stands for the state transition matrix, while Bjk
represents the symbol occurrence probabilities for all clusters
across all states for each model.  In the case of a five-state Markov
model, Aij would be a five-by-five matrix. If, for example, your
symbols library has 128 clusters, then Bjk would be 128 rows by five
columns, one column for each state from the Markov Model.

      The symbol likelihood probability is extracted from the Bjk
matrix but restricts you only to the row that corresponds to the
observed symbol.  In this case, Ot is the tth observed cluster
reference.  If the case were that you observed cluster reference CJ
which is very close to reference CJ+1, you would extract the
probability for reference CJ .  However, if the training data
contained many references of CJ+1 and few or none of CJ, then the
match score would be very poor.  With this method you would extract
the score for not only the observed symbol CJ, but also for the n
nearest neighbors to CJ, CJ+1 included.

      The figure illustrates an area in "sound space" where several
clusters happen to be near to one another.  A centroid of a...