Browse Prior Art Database

Method for Prosody Transition Modelling

IP.com Disclosure Number: IPCOM000016293D
Original Publication Date: 2002-Oct-21
Included in the Prior Art Database: 2003-Jun-21
Document File: 4 page(s) / 115K

Publishing Venue

IBM

Abstract

Problem Solved:

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 40% of the total text.

Page 1 of 4

Method for Prosody Transition Modelling

Problem Solved:

    Prosody modelling plays an important role in a Text-to-speech(TTS) system in order to generate natural speech. Currently no matter a rule-based method or a statistical one, people try to model the prosody features for the synthesizing units, but not the correlations of them. While the prosody in many cases is a relative concept, for example, high energy or pitch for one syllable does not mean it is stressed, but only if it is higher than its neighbors. This disclosure gives out a method for prosody transition modelling, which can model the correlations of the synthsizing units, hence more natural and stable prosody.

Novelty:

A method to model the prosody transitions of synthesizing units. As a detail component of the above method, a detail pitch transition model at the junction of 2 voiced synthesizing units.

Background:

Text Speech

                  Phonetic Annotated Parameters

Text

 Text Analysis

Prosody Model

 Speech Synthesis

Fig. 1 The overview of a TTS system

In a general TTS system, the prosody model plays an important role to generate natural voices. People always try to predict the target prosody parameters for a synthesizing unit, such as duration, pitch, energy etc. Another direction is to ignore the prosody parameter prediction, but select the samples by matching the prosody structure (context condition matching). This approach actually implicates the prosody model in the prosody structure matching algorithm and presume under the same prosody structure, the segment prosody is the same.

However they are not enough for 2 reasons:
1. Prosody is a relative concept, and the prosody perception is mainly related to the difference of the speech units under the contexts. So besides the prosody modelling for each synthesizing unit, it also very important to model the relative prosody parameters.
2. The second method still concerns the prosody for the synthesizing unit only, but just make it implicit. And there are another disadvantages to make it unstable, there are the inscience of the factors which impact the prosody, and the limitation of methods to acquire the prosody structure. We can define some prosody structure, but we do not know if they are all the factors to impact the prosody; and for some factors, such as stress, even we know they are quite important for the prosody of a unit, there is not an effective way to get such information during text analysis.

1

[This page contains 4 pictures or other non-text objects]

Page 2 of 4

Description:

As illustrated in Fig. 2, there are prosody targets(red lines) and interval prosody targets (the range of the blue lines) for the synthesizing units. Besides these, we can model the jump of the prosodies of the neighbor units. Under this further constrains, it will not happen that the selected segments are the left upper blue line and the right lower line, which is far out of the jump range.

previo us next

Ju m p

F ig . 2 I llu s t r a t io n o f p r...