Browse Prior Art Database

Fast Acoustic Prototype Adaptation by a Maximum Ultimately Deeply Discounted Likelihood Estimate (Muddle) Via the EM Algorithm

IP.com Disclosure Number: IPCOM000121132D
Original Publication Date: 1991-Jul-01
Included in the Prior Art Database: 2005-Apr-03
Document File: 8 page(s) / 282K

Publishing Venue

IBM

Related People

Nadas, A: AUTHOR [+3]

Abstract

An algorithm is proposed for continued prototype adaptation after initial training. It is designed to solve two problems: (1) how to combine old and new data and (2) how to avoid the computational burden of retraining prototypes.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 33% of the total text.

Fast Acoustic Prototype Adaptation by a Maximum Ultimately Deeply
Discounted Likelihood Estimate (Muddle) Via the EM Algorithm

      An algorithm is proposed for continued prototype
adaptation after initial training.  It is designed to solve two
problems: (1) how to combine old and new data and (2) how to avoid
the computational burden of retraining prototypes.

      We distinguish two types of talker adaptation: initial and
continued.  In the initial adaptation problem reference speech is
used to augment speech from a new talker and either training is based
on the combined data or on combined statistics or on some other use
of the two datasets.  The initial adaptation problem was considered
elsewhere and is not our concern here.  This invention is an
algorithm for continued prototype adaptation in conjunction with
decoding where the talker's own past statistics are available and
where time is at a premium.  In particular, we do not wish to keep
and continually retrain on substantial amounts of speech.
1.   Align the speech of the decoded sentence to fenemic baseforms
and sort the spectral vectors into prototype classes.  Let y1,...ynew
denote the vectors belonging to any one prototype.  Let Nold denote
the total sample size for this prototype processed prior to the
current sentence.
2.   Re-estimate each prototype for which there is new data by
maximizing the discounted approximate log-likelihood of the prototype
parameters based in a certain way on all Nold + Nnew vectors.  The
log-likelihood in question will be the sum of a discounted first
order (linear) approximation based on certain statistics retained
from the old data and a term which is an EM-type approximation of the
contribution of the new data to the total log-likelihood.
3.   Using the new prototypes, label and decode the next sentence.

      We describe the algorithm in terms of a single phone for which
we have processed Nold previous vectors and for which we have Nnew =
1 new vector (for general Nnew the relevant formulas become sums of
Nnew terms of the same form). These details are presented as follows:
1.   Summary of the re-estimation algorithm.  This will introduce the
vectors r, S, A, Q1, and the matrices H, Q11, whose definition will
appear later.
2.   We next describe the incomplete data model f(y|r); this is
simply the multivariate Gaussian mixture model defining one
prototype.  We pretend that the data is an IID sample from this
prototype.
3.   Next we give its logarithmic derivative, the incomplete score
vector (gradient) S(y,r) = Arlog f(y|r); the usual MLE is a root of
the sum (over the data) of these scores.
4.   Then we describe the second order logarithmic derivative, the
Hessian matrix H(y,r) ArS(y,r) which is needed for the local linear
approximation of the contribution to the log-likelihood of the old
data.
5.   We then describe the maximization problem and solve it by a
double loop iteration via an EM-like outer loop...