Browse Prior Art Database

Method and System for Maximum Accept and Reject (MARS) Training of HMM-GMM Speech Recognition Systems

IP.com Disclosure Number: IPCOM000199354D
Publication Date: 2010-Aug-31
Document File: 6 page(s) / 171K

Publishing Venue

The IP.com Prior Art Database

Abstract

A method and system for maximum accept and reject (MARS) training of Hidden Markov Model and Gaussian Mixture Model (HMM-GMM) based speech recognition system is disclosed. The method and system enables taking frame level phoneme classification errors into account while learning parameters of phoneme distributions.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 50% of the total text.

Page 1 of 6

Method and System for Maximum Accept and Reject (MARS) Training of HMM-GMM Speech Recognition Systems

Disclosed is a method and system for maximum accept and reject (MARS) training of Hidden Markov Model and Gaussian Mixture Model (HMM-GMM) based speech recognition system.

Various training techniques of HMM-GMM, such as Maximum Mutual Information (MMI), Minimum Classification Error (MCE), and Minimum Phone Error (MPE) are used, which include discriminative criterion. Similarly, Hybrid Artificial Neural Network/Hidden Markov Model (ANN-HMM) speech recognition system is also used which incorporates discriminative training by using neural networks. However, these techniques require language model (lattice) to identify confusable segments of speech in the form of denominator state occupation statistics. Thus, these techniques are coupled with specific language model.

In order to overcome the limitation of specific language model, the disclosed method and system involves utilizing a discriminative objective function for estimation of HMM parameters, which considers frame level errors for supplementing Maximum Likelihood (ML) optimization function. The ML optimization function is supplemented with emission/accept likelihood of an aligned state (phone) and rejection likelihoods from remaining states (phones).

An ML function may be expressed as:

(1)

In expression 1,

is a set of all the HMM-GMM parameters,

is a sequence of feature vectors corresponding to an utterance with a correct

phonetic transcription .

To supplement the ML function, the discriminative objective function for MARS training is given as follows:

1

[This page contains 5 pictures or other non-text objects]

Page 2 of 6

(2)

In expression 2,

is a state aligned at time ,

is a frame aligned at time ,

is a correct state sequence as per phonetic transcription,

are the total number of states in the HMMs being trained, and

is an empirical factor to control the influence of the accept and the reject likelihoods.

In expression 2, is an emission/accept likelihood, and reciprocal of

            is a rejection likelihood. It can be observed that a reject probability distribution function (pdf) is same as an accept pdf, but it appears in the denominator. Thus, discrimination of the states may be improved as accept pdfs and reject pdfs are utilized for each state. Further, qualitative analysis of how well other states are rejecting current frame that has been aligned with a given state, is taken into account.

In a scenario, a method for maximization of a hidden HMM state ( ) [*] may be expressed by utilizing an objective function for MARS training as:

(3)

(4)

Comparing (3) and (4), it can be observed that a difference between MARS training

Further, the ML objective function is given as:

2

[This page contains 13 pictures or other non-text objects]

Page 3 of 6

objective function and ML objective function is the term . This term

provides rejection likelihood for the frame when it has been aligned with th...