Browse Prior Art Database

Speech Recognition with Context Dependent Full Covariance Gaussian Acoustic Prototypes

IP.com Disclosure Number: IPCOM000110182D
Original Publication Date: 1992-Sep-01
Included in the Prior Art Database: 2005-Mar-25
Document File: 2 page(s) / 103K

Publishing Venue

IBM

Related People

Bahl, LR: AUTHOR [+4]

Abstract

In one prominent approach to speech recognition, acoustic parameter vectors are modelled as context-dependent mixtures of diagonal Gaussian distributions. Typically, each mixture consists of only 1 or 2 Gaussians, as it is impractical to estimate the parameters for larger numbers, given limited training data. Experimental evidence demonstrates that a single full-covariance Gaussian is a better acoustic model than a mixture of 2 diagonal Gaussians in the high-dimensional spaces usually found in speech recognition, but the covariance matrices cannot be estimated reliably from any reasonable amount of training data. The invention below allows a set of context-dependent full-covariance prototypes to be constructed in such a way that each prototype involves no more parameters than does a single diagonal Gaussian prototype.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Speech Recognition with Context Dependent Full Covariance Gaussian Acoustic Prototypes

       In one prominent approach to speech recognition, acoustic
parameter vectors are modelled as context-dependent mixtures of
diagonal Gaussian distributions.  Typically, each mixture consists of
only 1 or 2 Gaussians, as it is impractical to estimate the
parameters for larger numbers, given limited training data.
Experimental evidence demonstrates that a single full-covariance
Gaussian is a better acoustic model than a mixture of 2 diagonal
Gaussians in the high-dimensional spaces usually found in speech
recognition, but the covariance matrices cannot be estimated reliably
from any reasonable amount of training data.  The invention below
allows a set of context-dependent full-covariance prototypes to be
constructed in such a way that each prototype involves no more
parameters than does a single diagonal Gaussian prototype.

      Using a diagonal Gaussian prototype instead of a
full-covariance prototype is equivalent to assuming that the
correlation matrix of the the full-covariance Gaussian is known a
priori to be the identity matrix.  Once this assumption is made,
there are no prototype correlations to estimate, only means and
variances.  In this invention we shall assume that the correlation
matrix is known a priori, so that only means and variances have to be
estimated, but we shall not assume that the correlation matrix is the
identity.  Thus, instead of representing prototypes as diagonal
Gaussians, we shall model them as full-covariance Gaussians, but
since only means and variances have to be estimated, there are no
more parameters to estimate than the number required for a single
diagonal Gaussian.

      It will be assumed that some training data is available from
several different speakers.  The procedure is as follows:
(1)  Perform Steps 2-5 for each class C in the alphabet of
context-dependent classes.
(2)  Perform Steps 3-4 for each speaker S in the training speakers.
(3)  Extrac...