Browse Prior Art Database

Generation of "H" Sounds in Text-To-Speech Synthesis

IP.com Disclosure Number: IPCOM000060820D
Original Publication Date: 1986-May-01
Included in the Prior Art Database: 2005-Mar-09
Document File: 2 page(s) / 21K

Publishing Venue

IBM

Related People

Nartey, JNA: AUTHOR [+2]

Abstract

The present invention relates to a method for producing high-quality "H" sounds in a speech synthesizer. Because many speech synthesis systems construct utterances from a database of stored steady-state sounds (phonemes), or transitions between steady-states (diphones), it is necessary to have a steady-state description for each sound. However, the /h/ sound is so influenced by the characteristics of its surrounding sounds that it cannot be defined and stored as a steady-state phonemic unit on its own. Similarly, in the case of diphone synthesis, this chameleon effect makes it impossible to define transitions to a generic steady-state "H". A method for producing high-quality "H" sounds using diphones as primary units is now described, and the same underlying principle could be applied to phoneme synthesis as well.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 54% of the total text.

Page 1 of 2

Generation of "H" Sounds in Text-To-Speech Synthesis

The present invention relates to a method for producing high-quality "H" sounds in a speech synthesizer. Because many speech synthesis systems construct utterances from a database of stored steady-state sounds (phonemes), or transitions between steady-states (diphones), it is necessary to have a steady- state description for each sound. However, the /h/ sound is so influenced by the characteristics of its surrounding sounds that it cannot be defined and stored as a steady-state phonemic unit on its own. Similarly, in the case of diphone synthesis, this chameleon effect makes it impossible to define transitions to a generic steady-state "H". A method for producing high-quality "H" sounds using diphones as primary units is now described, and the same underlying principle could be applied to phoneme synthesis as well. In brief, the input string of diphones is scanned for the presence of the "H" sound. When found, the proceeding sound is tapered to silence, a transition state is constructed from an already existing unit, and the following sound is started with a gradual onset from silence. The method can be defined more rigorously and more generally in terms of the following diphone notation. Each diphone is represented as a pair- transition p(n):p(n+1), n = 1,3,5,... The string of diphones making up an utterance is scanned until p(n+1)="HX", at which point new diphones are inserted. By way of example, suppose ther...