Browse Prior Art Database

Rule Based Speech Synthesis Method Using a Residual Codebook

IP.com Disclosure Number: IPCOM000106889D
Original Publication Date: 1992-Jan-01
Included in the Prior Art Database: 2005-Mar-21
Document File: 4 page(s) / 166K

Publishing Venue

IBM

Related People

Saito, T: AUTHOR

Abstract

A method to synthesize natural-sounding speech for unlimited-vocabulary text by using an effectively-compressed residual source codebook is proposed here.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 37% of the total text.

Rule Based Speech Synthesis Method Using a Residual Codebook

       A method to synthesize natural-sounding speech for
unlimited-vocabulary text by using an effectively-compressed residual
source codebook is proposed here.

      BACKGROUND: In speech synthesis by rule which the LPC (Linear
Predictive Coding) speech analysis/synthesis technique is applied to,
the use of LPC residual signal is one of key issues to improve the
quality of synthetic speech (1-4). There are two substantial problems
left unsolved in applying LPC residual signal to rule-based LPC
speech synthesis as follows.
(1) Quality degradation according to the pitch modification

      In most rule-synthesis methods, a number of speech synthesis
units (usually, several hundreds of units) are extracted from actual
speech samples.  To use these units for generating speech of
arbitrary texts, which are different from the sample texts, the
original pitch of speech synthesis units should be modified to
coincide with the pitch contour of new texts.  The spectral
distortion caused by the pitch modification degrades the quality of
synthetic speech.  This type of quality degradation is more
considerable in a residual-excited synthesizer than in a
pulse/noise-excited synthesizer, because the residual signal is
fairly sensitive to the original pitch frequency whereas the pulse
signal has nothing to do with it.
(2) Sizable data of LPC residual signals for speech synthesis units

      The LPC residual signal is defined as the prediction residue of
LPC analysis.  To use the original residual data for all the speech
synthesis units causes a problem in implementing practical speech
synthesis systems, such as a Digital Signal Processor (DSP)-based
system which does not usually have sufficient data memory area to
store the residual sources.

      This proposal focuses mainly on problem (2) above, and the
result of it, conquers problem (1) in a sense.  Our experimental
system has about 360 speech synthesis units. The data size for
spectral data (the basic part of unit data) is 80 KB.  On the other
hand, the residual data size is 480 KB.  To improve the speech
quality, many more units should be accumulated for reflecting minute
contextual effects on the synthetic speech.  Therefore, the problem
becomes more critical for the quality improvement because the
residual data size increases in proportion to the number of units.

      PROPOSED METHOD: Creation of a Codebook for Voiced Residual
Signals

      Given this background, we propose here a method to create an
effectively compressed residual source codebook without degrading the
quality of synthetic speech.  There are two kinds of residual
signals:  the voiced and the voiceless.  The voiced residual signals
occupy 70-80% of the whole residual data.  This proposal is related
only with the massive part of the residual signals, i.e., the voiced
residual.  As for the residual signals for voiceless speech,...