Browse Prior Art Database

Short Term Low Moments Normalization for Speech Recognition

IP.com Disclosure Number: IPCOM000037057D
Original Publication Date: 1989-Nov-01
Included in the Prior Art Database: 2005-Jan-29
Document File: 4 page(s) / 30K

Publishing Venue

IBM

Related People

Nahamoo, N: AUTHOR [+2]

Abstract

A technique is described whereby improved speech recognition is achieved through the use of short term moments normalization.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 39% of the total text.

Page 1 of 4

Short Term Low Moments Normalization for Speech Recognition

A technique is described whereby improved speech recognition is achieved through the use of short term moments normalization.

Feature vectors employed for speech recognition are commonly degraded by spectral changes and ambient noise variation in the recording environment. These degradations, along with co-articulation effects and intra-speaker variations, result in an inconsistent mapping between the phonetic-acoustic events and their feature vector strings. Although many techniques, such as Spectral Subtraction 1, Blind Deconvolution 2, and Single Frame Mean Normalization 3, have been shown to provide good immunity to some of the sources of error, each technique usually replaces the removed sources of error with some undesirable side-effects.

For example, Single Frame Mean Normalization will result in feature vectors that carry little information about the energy content of the frames. Energy is a very robust feature in helping to distinguish between many speech events as well as between speech and noise. Also, when a total energy normalization is performed, cues present in the dynamic behavior of the speech signal are further obscured, resulting in degradation of the recognition accuracy.

The concept described herein utilizes Short Term Moments Normalization which generalizes the Mean Normalization technique in two aspects:

1. By incorporating a relaxation mechanism, the absolute moments of a current frame are adjusted not only by using information in the current time frame, but also with information from prior time frames. This provides the gain normalization, as in Single Frame Mean Normalization, plus it preserves the local energy contents, crucially needed for many speech events that are partially discriminated through their energy contents.

2. There is evidence that in addition to the mean (zero'th order moment), the higher-order moments of the spectral frames of different speakers are consistently related to each other over different speech events. This implies that the removal of such moments from the spectral frames could result in more consistent speaker-independent feature vectors.

Therefore, the concept considers a speech recognition system, used with a signal processing unit, that at some stage of processing represents the frame in terms of a feature vector whose elements are the Log-magnitude spectra of the frame signal over different frequency bands. Let Xt t =1, 2, ... be the sequence of such feature vectors. Each feature vector is represented in terms of two components, the first being the moments contribution and the second being the residual. Given that the vectors are N dimensional, it is assumed that they represent N-1 degree polynomials which are defined at the positive integer numbers between 1 and N. The component j of a feature vector Xt in terms of the first M moments is given by (1) where M takes any value between 0 and N-1. The residual vecto...