Browse Prior Art Database

Noise Adaptive Acoustic Spectral Prototypes for Speech Recognition

IP.com Disclosure Number: IPCOM000040122D
Original Publication Date: 1987-Sep-01
Included in the Prior Art Database: 2005-Feb-01
Document File: 3 page(s) / 27K

Publishing Venue

IBM

Related People

Nadas, AJ: AUTHOR [+2]

Abstract

A speech recognizer typically comprises a signal processor which extracts features from speech wherein the features represent components of a feature vector; a vector quantizer which classifies each feature vector into one of a small set of classes; and a back end that deter- mines the most probable word based on the quantizer output. Background noise --which can change from the level for which the system was originally adapted-- can adversely affect recognition. According to this invention, the quantizer includes an explicit model which accounts for the effect of background noise in order to reduce the sensitivity of the quantizer to noise level changes.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 3

Noise Adaptive Acoustic Spectral Prototypes for Speech Recognition

A speech recognizer typically comprises a signal processor which extracts features from speech wherein the features represent components of a feature vector; a vector quantizer which classifies each feature vector into one of a small set of classes; and a back end that deter- mines the most probable word based on the quantizer output. Background noise --which can change from the level for which the system was originally adapted-- can adversely affect recognition. According to this invention, the quantizer includes an explicit model which accounts for the effect of background noise in order to reduce the sensitivity of the quantizer to noise level changes. The effect of background noise is included in the quantizer by modelling the observable amplitude Z of the output of one of various filters as the maximum between X (the signal energy in a critical band) and Y (the noise energy in a critical band) so that Z=max(X,Y). In accordance with the invention, an algorithm is provided for learning the distribution of the pure signal X based on the hypothesis that the distribution of the noise Y is known. The model is adapted to new noise conditions by gathering new noise statistics from time to time and by using these statistics to update the distribution of noise.

The outputs of the filters are modelled as statistically mutually independent of one another. The random variable Z is observable, but X,Y cannot be observed. The individual distributions of the unobservable signals are learned by using an EM (Expectation/Maximization) algorithm in conjunction with a (K-MEANS) clustering algorithm. In particular, the following version of the K-MEANS clustering algorithm is employed: Step 0. Guess initial probability densities for each

prototype (i.e., class), and guess prototype

probabilities. Step 1. Classify all feature vectors by

maximizing the probability of the class times the

probability density of the feature vector given the

class. Step 2. Re-estimate the probability density for

each prototype by using only feature vectors belonging

to the class. Step 3. Go to Step 1 or quit. In that the procedure is the same for each prototype, a description of the model and its re-estimation for one prototype is set forth. The EM algorithm --which addresses the fact that complete signal and noise information is not available-- is an iterative algorithm defined as follows. Stage 0. Guess initial values for the unknown

probability parameters.

Stage 1. Using current values of unknowns for the

probability weights, compute the conditional expected

value of the log likelihood of the complete data given

the incomplete data. Since the log likelihood is a

fun...