Browse Prior Art Database

Speech Recognition using a Differentiated Spectrum

IP.com Disclosure Number: IPCOM000109560D
Original Publication Date: 1992-Sep-01
Included in the Prior Art Database: 2005-Mar-24
Document File: 2 page(s) / 84K

Publishing Venue

IBM

Related People

Destombes, F: AUTHOR

Abstract

The apparatus disclosed here performs speech recognition using a distance based not only on the frequency spectrum of speech, but also on the differentiated spectrum. This allows to better take into account the position of speech formants without requiring complex analysis to determine their position precisely.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Speech Recognition using a Differentiated Spectrum

       The apparatus disclosed here performs speech recognition
using a distance based not only on the frequency spectrum of speech,
but also on the differentiated spectrum. This allows to better take
into account the position of speech formants without requiring
complex analysis to determine their position precisely.

      The apparatus comprises:
      1. A microphone linked to an acoustic component.
      2. An acoustic component to determine various parameters,
including the frequency spectrum of successive speech "frames" (short
speech segments).
      3. A memory device holding models of speech sounds or speech
utterances together with labels identifying them.
      4. A recognition mechanism to compare incoming speech sounds or
utterances to the models in the memory device, based on computing a
"distance".
      5. A control mechanism to coordinate the device operation.
      6. Interface devices (e.g., keyboard, mouse, etc.) to allow the
user to communicate with the control mechanism.

      Under control of the control mechanism, the following
operations are performed:

      Speech sounds or speech utterances pronounced by the user of
the device are captured by the microphone.

      The acoustic component computes spectra for frames composing
those sounds and utterances.  Each spectrum can be expressed as a
series of N values S(i), i=1...N, where S(i) is the intensity of
sound at a frequency f(i) and frequencies f(i), i=1...N are in
ascending sequence.

      The recognition mechanism attempts to identify the sounds or
utterances by comparison with the models in the library, by computing
a distance D between each frame in the sound/utterance and
corresponding frames in t...