Browse Prior Art Database

Recognition of Stable Sounds

IP.com Disclosure Number: IPCOM000100862D
Original Publication Date: 1990-Jun-01
Included in the Prior Art Database: 2005-Mar-16
Document File: 4 page(s) / 151K

Publishing Venue

IBM

Related People

Crepy, H: AUTHOR [+4]

Abstract

A general sound recognition technique is described for use with speech recognition equipment so as to provide fast and accurate real-time recognition of stable phoneme, such as sustained vowels or continuants pronounced in isolation. The concept described herein concentrates on three distinct areas as follows: 1) Choice of the projection space and associated distance - As with all pattern recognition problems, this is critical to the performance of the system. 2) Iterative compaction process for the model set - By eliminating redundancy and generally cleaning up the model set, efficiency and performance of the recognition process is increased. Also, the use of separate training and recognition thresholds allows for greater flexibility in fine-tuning the system.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 40% of the total text.

Recognition of Stable Sounds

       A general sound recognition technique is described for
use with speech recognition equipment so as to provide fast and
accurate real-time recognition of stable phoneme, such as sustained
vowels or continuants pronounced in isolation. The concept described
herein concentrates on three distinct areas as follows:
      1)   Choice of the projection space and associated distance -
As with all pattern recognition problems, this is critical to the
performance of the system.
      2)   Iterative compaction process for the model set - By
eliminating redundancy and generally cleaning up the model set,
efficiency and performance of the recognition process is increased.
Also, the use of separate training and recognition thresholds allows
for greater flexibility in fine-tuning the system.
     3)   Application of speech recognition to speech therapy -
Speech recognition provides an objective measure of the adequacy of
pronunciation. Real-time feedback of this measure to the speaker,
allows for correcting some defects in pathologic speech.  The target
vowels (model set) can be constituted either from a mix of "normal"
voices, or from the best attempts of the speaker undergoing therapy,
who will attempt to reproduce them consistently.

      Speech Coding The input speech is sampled and digitized (e.g.,
12 bits ADC at 9600 Hz sampling rate).  Samples are grouped in frames
of approximately ten milliseconds (e.g., 128 samples at 9600 Hz =
13.35 msec.).  For each frame, a digital spectrum is computed and
coded as a series of values at various frequency points in the audio
range, measured in decibels (dB). For example, a linear predictive
coding spectrum can be computed from a limited set of
auto-correlation coefficients by using the Durbin algorithm, giving
forty-four energy values at frequency points spaced 75 Hz from 150 Hz
to 3,375 Hz.  The spectrum constitutes a projection of the real
physical sound into a multi-dimensional space (e.g., a 44-dimensional
space). Each frame can be plotted as a point in that space.

      Acoustic Distance The projection space (a.k.a.  model space) is
chosen so that a distance can be computed between its points.  The
distance is chosen so that similar sounds which have similar
frequency spectra have representative points close to one another
according to the distance, and points representing dissimilar sounds
are far apart.  For example, the distance can be the average of the
absolute differences of the coordinates (spectrum energies), in dB.
Given two spectra Si,i=0 to 43 and S'i,i=0 to 43, the distance will
be:

                            (Image Omitted)

This distance will be low (ideally 0 dB, but practically down to
about one dB) for very similar sounds (e.g., same vowel pronounced by
same speaker), and high for dissimilar sounds (in the area of 8 to 10
dB for different vowels, like a  and  i ).

      Recognitio...