Browse Prior Art Database

Speech Recognition using an Auditory Model

IP.com Disclosure Number: IPCOM000116739D
Original Publication Date: 1995-Oct-01
Included in the Prior Art Database: 2005-Mar-31
Document File: 4 page(s) / 137K

Publishing Venue

IBM

Related People

Neti, C: AUTHOR

Abstract

Disclosed is an algorithm for acoustic processing based on mammalian auditory processing. Processing techniques used in the algorithm allow speech recognition to degrade more gradually with increasing levels of speech-like noise than conventional techniques based on Fourier spectrum analysis.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 46% of the total text.

Speech Recognition using an Auditory Model

      Disclosed is an algorithm for acoustic processing based on
mammalian auditory processing.  Processing techniques used in the
algorithm allow speech recognition to degrade more gradually with
increasing levels of speech-like noise than conventional techniques
based on Fourier spectrum analysis.

      Speech recognition is performed by generating feature vectors
of short segments of speech, and by then performing statistical
pattern recognition on a finite grouping of these vectors,
corresponding to a unit of speech, such as a phone or a word.  While
several approaches have been attempted in the development of a
"noise-robust" speech recognition system, accounting for variations
in feature vectors due to noise, the accuracy of speech recognition
systems degrades in the presence of noise (1,2).  Some of these
approaches are based on forming distortion metrics which are less
sensitive to variability of the feature vectors.  Some of these
approaches concentrate on additive compensation in the feature vector
domain.  In some cases transformations are used in the vector
quantization (VQ) domain, from the noisy signal to a reference VQ
codebook, while in other cases speech prototypes are developed to
account for specific variations due to noise.

      The approach of the presently-disclosed algorithm defines a set
of operations that preserve the information necessary for speech
recognition while reducing the contribution due to noise.  Models of
auditory processing have been shown to exhibit such properties
(3,4,5).  The presently-disclosed auditory processing model uses some
principles developed in (6) and Wang (7), extending these principles
to include a particular method for temporal process ing that is
suited to the IBM* Tangora recognizer (8), based on a hidden Markov
model, giving this recognizer accurate speech perfor mance and
tolerance to noise.

      The processing technique uses a model of the cochlea
represented by steps described in Equations (1) through (6).  A
linear filtering operation of this model is described in Equation
(1).  In the current implementation of the method, the filter is
modeled as a lowpass filter followed by a bandpass filter.  Filtering
is carried out in the time domain by using the bilinear transform on
the continuous time filter transforms (6).  The next step is a model
of the hair cell, which is given by Equation (2).  As proposed in
(7), this step is followed by a spatial derivative operation across
the s-axis, as shown in Equation (3).  The spatial derivative
operation models lateral inhibitory neural processing and is
suggested as an important step toward noise reduction in the final
representation (7).  The result given in Equation (3) can be
approximated by taking the partial derivative with respect to s at
the peaks of the result determined in Equation (1) (5).  Following
this operation, a half-wave rectified signal is gene...