Browse Prior Art Database

Rule-Based Speech Synthesis Method using Context-Dependent Syllabic Units

IP.com Disclosure Number: IPCOM000117125D
Original Publication Date: 1995-Dec-01
Included in the Prior Art Database: 2005-Mar-31
Document File: 2 page(s) / 69K

Publishing Venue

IBM

Related People

Hashimoto, Y: AUTHOR [+2]

Abstract

Disclosed is a method for constructing a context-dependent Japanese syllabic unit inventory, which effectively incorporates the spectral influence of the left- and right-hand neighboring phonemes on CV syllabic units by means of statistical analysis. The synthetic speech generated by using the proposed unit inventory, designed for a waveform-concatenation-based TTS (text-to-speech) system, is highly natural and intelligible.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 60% of the total text.

Rule-Based Speech Synthesis Method using Context-Dependent Syllabic
Units

      Disclosed is a method for constructing a context-dependent
Japanese syllabic unit inventory, which effectively incorporates the
spectral influence of the left- and right-hand neighboring phonemes
on CV syllabic units by means of statistical analysis.  The synthetic
speech generated by using the proposed unit inventory, designed for a
waveform-concatenation-based TTS (text-to-speech) system, is highly
natural and intelligible.

      The Figure shows a block diagram of the process for generating
the proposed units.  It consists of two main steps; the generation of
phonemic clusters and the generation of context-dependent syllabic
units.  A phonemic cluster is a phone segment that represents the
average spectral dynamics for a given triphone context.  (V)CV(C)
units denote context-dependent syllabic units in the Figure.

      In the first step, a phonemic cluster set is obtained by a
context-dependent clustering method developed for this purpose.  The
clustering algorithm begins by placing all contexts of a phoneme into
a single cluster.  The number of clusters will be increased one at a
time by splitting the cluster in two within a context where the
maximum distortion occurs.  Assume that the context of a phoneme is
determined by the left- and right-hand neighboring phonemes and their
places of articulation.  The obtained phonemic cluster set
approximates the spectral behavio...