Browse Prior Art Database

Probabilistic Neural Functions of Markov Chains (Neural Hmms)

IP.com Disclosure Number: IPCOM000102668D
Original Publication Date: 1990-Dec-01
Included in the Prior Art Database: 2005-Mar-17
Document File: 4 page(s) / 140K

Publishing Venue

IBM

Related People

Burshtein, D: AUTHOR [+3]

Abstract

For continuous parameter speech modeling it is shown how to construct and train probabilistic functions of a Markov chain (HMM) using (deterministic) neural nets (multilayer perceptrons).

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 47% of the total text.

Probabilistic Neural Functions of Markov Chains (Neural Hmms)

       For continuous parameter speech modeling it is shown how
to construct and train probabilistic functions of a Markov chain
(HMM) using (deterministic) neural nets (multilayer perceptrons).

      The current practice in HMM modeling is to assume that the
outputs are mutually independent given the path.  We now propose what
is essentially a nonparametric (= many parameters) approach to the
problem.  For practical purposes we shall retain the multivariate
Gaussian assumption for conditional distribution of a timeslice given
the path and given previous vectors; however, these will be connected
with highly nonlinear regressions so that the joint distribution of
output vectors given a path will be modeled in a much more flexible
way than is possible with a Gaussian process which is incapable of
any but linear regressions.

                            (Image Omitted)

      Let X = {Xt¯t = 1, ... , T} be a finite Markov chain
and let Y be a probabilistic function of it Y = {Yt¯t = 1, ...
, T'}. It is possible to have T * T' by means of self loops
and null transitions; this will be ignored here.  We assume that
(X,Y) are jointly distributed random processes such that the
conditional probability of the output Y given a path X = x has the
form where           are transition  probabilities and where is an
n-dimensional probability density.  In other words the chain is a
first order Markov chain while the output is a k-th order Markovian
vector process when conditioned on the chain and depends on the chain
only through one step transitions.  We shall denote the conditional
regression functions given the chain (= conditional expectation of
the next output vector given the past) by

      There are many ways to introduce neural nets into this model.
A very general way would be to take where fijrij is a neural net
constrained to be a density in yt having weight parameters rij .  The
joint density of the observables is then Regarded as a function L(r)
of the model parameters r this is the likelihood function whose
maximum would furnish the MLE.  However, such a maximization in a
practical speech recognition problem is likely to be too hard because
of difficulties in enforcing the probabilistic constraints. Hence, we
propose the following simplifications.

      Model each qij by choosing a member of a parametric family
indexed by a parameter set where the parameter is chosen by a neural
function of the past.  For example, pg(yt) can be taken as a
multivariate Gaussian density whose mean vector and covariance matrix
constitute the parameter g. Now let be a neural net function that
chooses the parameter lambda based last k vectors.  The function may
be specific to the transition being taken by the chain and it has
weight parameters rij specific to the transition also.  Let r denote
the collection all weight parameters.  T...