Browse Prior Art Database

Adaptive Interpolation of Probabilistic Models

IP.com Disclosure Number: IPCOM000110278D
Original Publication Date: 1992-Nov-01
Included in the Prior Art Database: 2005-Mar-25
Document File: 3 page(s) / 70K

Publishing Venue

IBM

Related People

Merialdo, B: AUTHOR

Abstract

Probabilistic models are one of the most useful tools in speech recognition, both for the acoustic and the linguistic level. To achieve a high complexity, these models need to have a high number of parameters. But in that case, they require large amounts of training data to get decent estimates.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Adaptive Interpolation of Probabilistic Models

       Probabilistic models are one of the most useful tools in
speech recognition, both for the acoustic and the linguistic level.
To achieve a high complexity, these models need to have a high number
of parameters.  But in that case, they require large amounts of
training data to get decent estimates.

      In order to get the best compromise between the number of
parameters and the limited amount of training data that is available,
the models that are generally used are made up from combinations of
several models that differ in scope, some with a high number of
parameters, lots of which are badly estimated, some with fewer
parameters but better estimated.  For example, the standard 3-gram
language model which predicts the probability of a word given the
last two words of the sentence is built as an interpolation of the
2-nd order, 1-st order, 0-th order and uniform frequencies:

                            (Image Omitted)

where u (w3) is the uniform distribution over the set of words.  The
coefficients lambda gi are positive and sum to 1.  They generally
depend on the count of the bigram (w1w2).

      The problem is that the interpolation coefficients are computed
once for all in the training phase. This may not be always very
efficient.

      The previous problem arises because the choice of the
interpolation coefficients is fixed once for all during the training
phase.  To improve the situation, these coefficients are modified
during the decoding phase, so that they can adapt to the current
situation.  This allows the combined model to better capture the
variability of the situations as they are encountered.

      Let us st...