Browse Prior Art Database

Generation of Nasalized Vowels in Text-To-Speech Synthesis

IP.com Disclosure Number: IPCOM000060835D
Original Publication Date: 1986-May-01
Included in the Prior Art Database: 2005-Mar-09
Document File: 1 page(s) / 12K

Publishing Venue

IBM

Related People

Nartey, JNA: AUTHOR [+2]

Abstract

The present method involves synthesizing the nasalization of vowels between consonants in a speech synthesis environment. Briefly, (a) primary speech units --such as diphones-- which are concatenated to form words are scanned for the presence of a nasal consonant, (b) a look-ahead is performed to detect the presence of a second nasal consonant, and (c) if a second nasal consonant is detected, a nasal branch of the synthesizer is turned on for the duration of the intervening vowel. In describing the method in further detail, it is observed that for most phoneme or diphone formant synthesizers, there are 10 to 40 control parameters guiding the synthesizer in producing a speech waveform. These parameters change through time; the entire time ensemble for each parameter-class is often referred to as a "channel".

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 57% of the total text.

Page 1 of 1

Generation of Nasalized Vowels in Text-To-Speech Synthesis

The present method involves synthesizing the nasalization of vowels between consonants in a speech synthesis environment. Briefly, (a) primary speech units - -such as diphones-- which are concatenated to form words are scanned for the presence of a nasal consonant, (b) a look-ahead is performed to detect the presence of a second nasal consonant, and (c) if a second nasal consonant is detected, a nasal branch of the synthesizer is turned on for the duration of the intervening vowel. In describing the method in further detail, it is observed that for most phoneme or diphone formant synthesizers, there are 10 to 40 control parameters guiding the synthesizer in producing a speech waveform. These parameters change through time; the entire time ensemble for each parameter- class is often referred to as a "channel". One common parameter is called AN (amplitude of nasality). By way of an example, let ANi be the amplitude of nasalization as a function of time for a synthesized speech utterance. AN = 0 would imply no nasalization. First the detection of the presence of steady- state nasalization at a particular time point, i, must take place to trigger the algorithm: ANi > 0 and ANi+1 = ANi i = 1,2,3... [1] If the above condition (Eq.
[1]) is true for a particular i, then i is saved, and a search is conducted for afuture region of steady- state nasalization from i + t1 to i+ t2 (for example, t1 = 5 ms and t2 30 ms)...