Browse Prior Art Database

Speech Recognition via Cross Correlation of Differentiated Spectra

IP.com Disclosure Number: IPCOM000109570D
Original Publication Date: 1992-Sep-01
Included in the Prior Art Database: 2005-Mar-24
Document File: 3 page(s) / 130K

Publishing Venue

IBM

Related People

Destombes, F: AUTHOR

Abstract

The apparatus disclosed here performs a common speech recognition task (frame labeling) using a distance based on the differentiated frequency spectra. This allows to take into account the position of speech formants without requiring a complex analysis to determine their position precisely.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Speech Recognition via Cross Correlation of Differentiated Spectra

       The apparatus disclosed here performs a common speech
recognition task (frame labeling) using a distance based on the
differentiated frequency spectra.  This allows to take into account
the position of speech formants without requiring a complex analysis
to determine their position precisely.

      The apparatus comprises:
      1. A microphone linked to an acoustic component.
      2. An acoustic component to determine various parameters,
including the frequency spectrum of successive speech "frames" (short
speech segments).  For example, the technique used to compute such  a
spectrum might be Linear Predictive Coding.
      3. A memory device holding models of speech frames with labels
identifying them.
      4. A recognition mechanism to compare incoming speech sounds or
utterances to the models in the memory device, based on a computed
"distance".
      5. A control mechanism to coordinate the device operation.
      6. Interface devices (e.g., keyboard, mouse, etc.) to allow the
user to communicate with the control mechanism.

      Under control of the control mechanism, the following
operations are performed:
      - Speech sounds or speech utterances pronounced by the user of
the device are captured by the microphone.  The acoustic component
computes spectra for frames composing those sounds and utterances.
Each spectrum can be expressed as a series of N values Si, i=1... N,
where Si is the intensity of sound at a frequency fi and frequencies
fi, i=1... N are in ascending sequence.
      - The recognition mechanism attempts to identify the frames in
the sounds or utterances by comparison with the models in the
library, by computing a distance D between each frame in the
sound/utterance and corresponding frames in the models.  The model
which yields the smallest distance is considered as recognized, and
its label is used to identify the unknown frame.  The control
mechanism may then consolidate individual frame labels to make more
global decisions, e.g., utterance recognition.

      If the spectrum of a frame is S and the spectrum of a model
frame is Mi, the distance D between S and M is computed by the
following operations:
      Spectrum differentiation:
      S'i= Si Si,i = 1 ... N - 1
     ...