Creating rules automatically for adapting phonetic forms to a speaker in a TTS system
Original Publication Date: 2004-Sep-02
Included in the Prior Art Database: 2004-Sep-02
Concatenative TTS systems splice together short voice samples extracted from recordings of a real speaker, in order to match a target a speaker-independent phonetic transcription derived from the input text. When the target phonetic forms output by the front-end do not match the recorded speaker pronunciations, the output signal is degraded. We use a set of speaker-dependent rules to map the front-end output pronunciations into speaker-adapted ones. The rules are produced from a decision tree trained on the speaker-dependent recorded data.