Browse Prior Art Database

Tied Mixture Continuous Parameter Modelling for Speech Recognition

IP.com Disclosure Number: IPCOM000037268D
Original Publication Date: 1989-Dec-01
Included in the Prior Art Database: 2005-Jan-29
Document File: 4 page(s) / 80K

Publishing Venue

IBM

Related People

Bellegarda, JR: AUTHOR [+2]

Abstract

Discrete and continuous parameter approaches to the acoustic-modelling problem in automatic speech recognition are unified through a class of general hidden Markov models, whose output probability distributions are specified using tied mixtures of simple multivariate densities. Speech recognition experiments performed on large vocabulary office correspondence tasks demonstrate some of the resulting benefits.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 42% of the total text.

Page 1 of 4

Tied Mixture Continuous Parameter Modelling for Speech Recognition

Discrete and continuous parameter approaches to the acoustic-modelling problem in automatic speech recognition are unified through a class of general hidden Markov models, whose output probability distributions are specified using tied mixtures of simple multivariate densities. Speech recognition experiments performed on large vocabulary office correspondence tasks demonstrate some of the resulting benefits.

Acoustic channel modelling is a crucial problem in an automatic speech recognition system, such as described in 1. If AT denotes the acoustic sequence corresponding to the sequence of words uttered, WT, the problem is to find an appropriate hidden Markov model for the quantity Pr (AT WT), so as to represent the speech waveform in a parsimonious and meaningful fashion. More specifically, the goal of acoustic modelling is to isolate a class of such models with enough flexibility to preserve the information necessary for good recognition, and yet sufficiently simple to be computationally tractable.

The solution outlined below evolves from the unification of two traditional approaches to the problem. Suppose that the front-end processor extracts from the speech waveform one vector of acoustic parameters per frame, such as the energy in each of D spectral bands. In the discrete parameter approach, the resulting sequence of vectors is vector-quantized into a string of labels which is then assigned a (non-parametric) multinomial probability distribution; as a result, a severe loss of information about the original speech waveform may occur. In contrast, in the continuous parameter approach, the sequence of acoustic parameter vectors is assigned a multivariate probability distribution directly, most often Gaussian, for cost effectiveness; since, however, a single Gaussian distribution can only model unimodal behavior, this may lead to gross inaccuracies in the resulting model 2,3. In its most general form, the new class of hidden Markov models presented here allows continuous parameter modelling based on tied mixtures of simple (unimodal) probability distributions. This compromise tends to reduce the modelling inaccuracies arising from a single unimodal distribution while at the same time retaining some of the flexibility of the discrete model.

General Hidden Markov Models: Assume that the acoustic evidence AT is the observed output of a hidden Markov model, M. If we denote by xn the state of the underlying Markov chain at time n, M admits the following set, S, of parameters: initial probabilities Pr(bi)=Pr(x0=i), transition probabilities Pr(aij)=Pr(xn=j xn-1=i), and output probabilities Pr(Tn aij)=Pr(Tn xn=j,xn-1=i). Note the double meaning of Tn: in the discrete case, Tn refers to the label at time n (a scalar); otherwise, Tn is the acoustic parameter vector at time n. In the latter case, the function Pr($ aij) is defined on a continuous space, and represents the out...