Browse Prior Art Database

Variable Length Markov Baseforms for Voice Command Recognition

IP.com Disclosure Number: IPCOM000112631D
Original Publication Date: 1994-Jun-01
Included in the Prior Art Database: 2005-Mar-27
Document File: 2 page(s) / 68K

Publishing Venue

IBM

Related People

Bahl, LR: AUTHOR [+5]

Abstract

A method is disclosed to generate and use Markov baseforms whose length is optimized to better represent the various voice commands.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Variable Length Markov Baseforms for Voice Command Recognition

A method is disclosed to generate and use Markov baseforms whose
length is optimized to better represent the various voice commands.

      Fullword model baseforms are very often represented as a series
of concatenated elementary cells.  A common approach in such models
is to use a fixed number of cells for each word in the vocabulary,
e.g., 5 cells as in reference [*].  Each cell therefore models
roughly one-fifth of the word.

      This approach has the drawback that the quality of the model
depends on the length of the word.  Short words are modeled more
precisely than long words.  The reason for this is quite obvious.  In
a short word, each cell models a short segment of speech and the
production probability distributions for each cell will be extremely
sharp, representing the few speech events that occur in the short
period being modeled.  On the other hand, in a long word each cell
will model a proportionately longer segment of speech and the
corresponding probability distributions must model a bigger variety
of speech events, resulting in less sharp distributions.  This
results in poorer models for the longer words.

      Increasing the length of all the models, say, from 5 to 10
cells, will improve the models for the long words, but will create a
new problem for short words, because more training data will be
required to adequately train the short word models.

      The method disclosed here consists in having different lengths
for the baseforms of the different words of the vocabulary.  The
chosen length is proportional to the average observed duration of the
word utterances in the training data.

      The advantage over the fixed-length approach is that the
accuracy of the model does not dep...