Dismiss
The Prior Art Database and Publishing service will be updated on Sunday, February 25th, from 1-3pm ET. You may experience brief service interruptions during that time.
Browse Prior Art Database

# Method for Prosody Transition Modelling

IP.com Disclosure Number: IPCOM000016293D
Original Publication Date: 2002-Oct-21
Included in the Prior Art Database: 2003-Jun-21
Document File: 4 page(s) / 115K

IBM

## Abstract

Problem Solved: Prosody modelling plays an important role in a Text-to-speech(TTS) system in order to generate natural speech. Currently no matter a rule-based method or a statistical one, people try to model the prosody features for the synthesizing units, but not the correlations of them. While the prosody in many cases is a relative concept, for example, high energy or pitch for one syllable does not mean it is stressed, but only if it is higher than its neighbors. This disclosure gives out a method for prosody transition modelling, which can model the correlations of the synthsizing units, hence more natural and stable prosody. Novelty: A method to model the prosody transitions of synthesizing units. As a detail component of the above method, a detail pitch transition model at the junction of 2 voiced synthesizing units.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 40% of the total text.

Page 1 of 4

Method for Prosody Transition Modelling

Problem Solved:

Prosody modelling plays an important role in a Text-to-speech(TTS) system in order to generate natural speech. Currently no matter a rule-based method or a statistical one, people try to model the prosody features for the synthesizing units, but not the correlations of them. While the prosody in many cases is a relative concept, for example, high energy or pitch for one syllable does not mean it is stressed, but only if it is higher than its neighbors. This disclosure gives out a method for prosody transition modelling, which can model the correlations of the synthsizing units, hence more natural and stable prosody.

Novelty:

A method to model the prosody transitions of synthesizing units. As a detail component of the above method, a detail pitch transition model at the junction of 2 voiced synthesizing units.

Background:

Text Speech

Phonetic Annotated Parameters

Text

Text Analysis

Prosody Model

Speech Synthesis

Fig. 1 The overview of a TTS system

In a general TTS system, the prosody model plays an important role to generate natural voices. People always try to predict the target prosody parameters for a synthesizing unit, such as duration, pitch, energy etc. Another direction is to ignore the prosody parameter prediction, but select the samples by matching the prosody structure (context condition matching). This approach actually implicates the prosody model in the prosody structure matching algorithm and presume under the same prosody structure, the segment prosody is the same.

However they are not enough for 2 reasons:
1. Prosody is a relative concept, and the prosody perception is mainly related to the difference of the speech units under the contexts. So besides the prosody modelling for each synthesizing unit, it also very important to model the relative prosody parameters.
2. The second method still concerns the prosody for the synthesizing unit only, but just make it implicit. And there are another disadvantages to make it unstable, there are the inscience of the factors which impact the prosody, and the limitation of methods to acquire the prosody structure. We can define some prosody structure, but we do not know if they are all the factors to impact the prosody; and for some factors, such as stress, even we know they are quite important for the prosody of a unit, there is not an effective way to get such information during text analysis.

1