Browse Prior Art Database

Method for Connecting Speech Synthesis Units

IP.com Disclosure Number: IPCOM000034185D
Original Publication Date: 1989-Jan-01
Included in the Prior Art Database: 2005-Jan-27
Document File: 3 page(s) / 67K

Publishing Venue

IBM

Related People

Saito, T: AUTHOR

Abstract

This article describes a method for connecting speech synthesis units that generates synthesis parameters for naturally co-articulated phonemes. (Image Omitted) Background Spectral features of phonemes in natural speech change under the influence of their phonetic environments. Such spectral fluctuation of phonemes has been one obstacle to synthesizing high quality speech successfully in rule synthesis systems. Most of former rule-systhesis methods were not successful in synthesizing, what is called co-articulated phonemes mainly because fixed speech synthesis units were used. A fixed extracted environment, connecting such units with a simple smoothing like linear interpolation, is not sufficient for generating appropriate co-articulations in various phonetic environments.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 3

Method for Connecting Speech Synthesis Units

This article describes a method for connecting speech synthesis units that generates synthesis parameters for naturally co-articulated phonemes.

(Image Omitted)

Background Spectral features of phonemes in natural speech change under the influence of their phonetic environments. Such spectral fluctuation of phonemes has been one obstacle to synthesizing high quality speech successfully in rule synthesis systems. Most of former rule-systhesis methods were not successful in synthesizing, what is called co-articulated phonemes mainly because fixed speech synthesis units were used. A fixed extracted environment, connecting such units with a simple smoothing like linear interpolation, is not sufficient for generating appropriate co-articulations in various phonetic environments. Connecting speech synthesis units using STCV Given this background, a method is proposed to synthesize naturally co-articulated phonemes in various phonetic environments. In the proposed method, synthesis parameters for co-articulated phonemes are generated by connecting "transformed synthesis units", meaning that prior to connection, synthesis units are transformed using STCVs (Spectral Transition Control Vectors) in order to suit themselves to requested phonetic environments. (The concept of STCV is different from "target", which is commonly used in formant synthesizers for English [1]). Fig. 1 shows the mechanism of generating a co-articulated phoneme. In the figure, CV and VC (C:Consonant, V:Vowel) are used as speech synthesis units. This example is the case where a vowel /VO/ for an environment /C1VOC2/ is to be generated.

The spectrum of vowel /VO/ sandwiched by consonants /C1/ and /C2/ is obtained by connecting two units, /C1VO/ and /VOC2/. Prior to connection, the vowel parts of both units are transformed using an STCV, VO(C1, C2). The STCV is to modify the original spectral loci of /VO/ in both units so as to adapt their vowel parts to the phonetic environment of /C1...