Browse Prior Art Database

Constructing Method for Speech Synthesis Units

IP.com Disclosure Number: IPCOM000062605D
Original Publication Date: 1986-Dec-01
Included in the Prior Art Database: 2005-Mar-09
Document File: 2 page(s) / 59K

Publishing Venue

IBM

Related People

Ohshima, Y: AUTHOR

Abstract

A segmentation and smoothing method is proposed to build smoothly connectable speech synthesis units from human utterances. Background Diphone, as a speech synthesis unit [*], enables smooth connection and sophisticated duration control. However, it is difficult to build a diphone which works in various phonetic environments. Some phonemes are strongly co-articulated or need allophones to keep intelligibility and naturalness. Also, in commbining synthesis units to synthesize a word, sentence or text, smoothing is required to avoid a perceptual discontinuity between connected frames caused by changeable vocal effort. VCV (Vowel Consonant Vowel) - based diphone In this proposal, diphones are adapted to include co-articulations or allophonic features by additional entries for specific phonetic (Image Omitted) environment.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 54% of the total text.

Page 1 of 2

Constructing Method for Speech Synthesis Units

A segmentation and smoothing method is proposed to build smoothly connectable speech synthesis units from human utterances. Background Diphone, as a speech synthesis unit [*], enables smooth connection and sophisticated duration control. However, it is difficult to build a diphone which works in various phonetic environments. Some phonemes are strongly co-articulated or need allophones to keep intelligibility and naturalness. Also, in commbining synthesis units to synthesize a word, sentence or text, smoothing is required to avoid a perceptual discontinuity between connected frames caused by changeable vocal effort. VCV (Vowel Consonant Vowel) - based diphone In this proposal, diphones are adapted to include co-articulations or allophonic features by additional entries for specific phonetic

(Image Omitted)

environment. In a mora-based phonetic system such as Japanese, these problems are solved by extracting parameters from VCV segments without losing freedom of duration control. In a VCV-based diphone set, only a pair of (V1C) and (CV2) diphones from the same V1CV2 segment can be connected with each other at the consonant portion. Note that, for example, of 5 Japanese vowels /a,e,i,o,u/ and consonant /r/, 5 different kinds of (ar) diphone must be prepared for each succeeding vowel, and 5 kinds of (ra) diphone must be prepared for each preceding vowel. Fig. 1 shows the example of proposed segmentation. In Fig. 1, poin...