Browse Prior Art Database

Formant Algorithm for Intelligible Speech Synthesis

IP.com Disclosure Number: IPCOM000104935D
Original Publication Date: 1993-Jun-01
Included in the Prior Art Database: 2005-Mar-19
Document File: 2 page(s) / 65K

Publishing Venue

IBM

Related People

Farrett, PW: AUTHOR

Abstract

Disclosed is an algorithmic approach for formant composition in text-to-speech synthesis in which the perception of (speech) formants is more intelligible.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Formant Algorithm for Intelligible Speech Synthesis

      Disclosed is an algorithmic approach for formant composition in
text-to-speech synthesis in which the perception of (speech) formants
is more intelligible.

      Background - A problem with current text-to-speech synthesis
technologies in terms of user acceptance is the lack of perceived
intelligibility in structure of the audio bandwidth (i.e., a
particular frequency area in the audio speech spectrum) specifically,
formants f1, f2, f3 that span the audio speech bandwidth from
approximately 200 Hz to 2700 Hz.  (The most sensitive part of audio
reconstruction with respect to speech intelligibility is the
frequency range of 300 Hz.  Curiously, the resonant frequency of the
ear canal, which is vital for the feedback loop - hearing/speaking -,
is approximately the same.)  in a speech synthesizer, if the vocal
tract model fails to produce one of the lower formants (particularly
f1 or f2) then the degree of intelligibility is greatly reduced.
(Conversely, higher formants f4-f5, or f4-f10, which depend on how
the audio speech bands are segmented, denote personality); generally
the 1st, 2nd, and 3rd formants are said to determine meaning while
the 4th through Nth formants determine personality.

      Besides failing to accurately reproduce the formants, another
problem is what values should allphones have as a result of sounds
preceding or succeeding them regarding changes in formants (i.e.,
acoustic resonances that are contingent upon phoneme placement).
This area is what this disclosure addresses.

      General Description - More intelligible speech can be obtained
when formants during a transition from one phoneme to another start
and end in different values t...