Browse Prior Art Database

Phonemic Segmentation Method for Automatic Construction of a Synthesis Unit Inventory

IP.com Disclosure Number: IPCOM000118320D
Original Publication Date: 1996-Dec-01
Included in the Prior Art Database: 2005-Apr-01
Document File: 4 page(s) / 110K

Publishing Venue

IBM

Related People

Saito, T: AUTHOR

Abstract

Disclosed is a method for automatically dividing speech utterances into phonemic segments, which are used as basic elements for constructing synthesis unit inventories in rule-based speech synthesis systems. In this method, a new segmentation parameter called "dynamic of the waveform envelope" is proposed. Its purpose is to reinforce the segmentation performance by compensating for defects in the spectral dynamics, which is a well-known speech segmentation parameter. 1. Derivation of the Dynamics of the Waveform Envelope Signal

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Phonemic Segmentation Method for Automatic Construction of a Synthesis
Unit Inventory

      Disclosed is a method for automatically dividing speech
utterances into phonemic segments, which are used as basic elements
for constructing synthesis unit inventories in rule-based speech
synthesis systems.  In this method, a new segmentation parameter
called "dynamic of the waveform envelope" is proposed.  Its purpose
is to reinforce the segmentation performance by compensating for
defects in the spectral dynamics, which is a well-known speech
segmentation parameter.
  1.  Derivation of the Dynamics of the Waveform Envelope Signal

        First, divide a speech utterance into fixed-length
segments.  The length of a segment, L, should be such that the
segment includes at least one or two pitch periods.  For instance,
15-20 ms is  appropriate for this purpose.  Next, find a maximum
value in each segment  and obain the waveform envelope signal, E(i),
by interpolating the maximum values as follows:
    E(i) = ( (n(j)-i)*Ef(j-1)+(i-n(j-1))*Ef(j) ) / (n(j)-n(j-1))
         ( j*L  <= i < n(j) )
    E(i) = ( (n(j+1)-i)*Ef(j)+(i-n(j))*Ef(j+1) ) / (n(j+1)-n(j))
         ( n(j) <= i < (j+1)*L )
  where
    Ef(j) = max { x(i) },
    n(j)  = argmax { x(i) } (j*L <= i < (j+1)*L, j: frame number)
    x(i): speech signal (i: sample point number)
    L: length of the waveform envelope frame (samples)

      Using E(i), the dynamics of the waveform envelope signal,
De(j), is obtained by solving the following equation for each frame:
    De(j) = sum{ j*W(j)*Eav(j) } / sum{ j*j*W(j) }
          j=-M,..,+M             j=-M,..,+M
  where
    W(j): weighting function (symmetric window assumed, i.e.,
     W(j)=W(-j) )
    2*M+1: window length (frames)
    K: length of analysis frame (samples)
    Eav(j) = sum{ E(i) } / K
          i=j*K,..,(j+1)*K-1
  2.  Procedure for Automatic Segmentation

The procedure for automatic segmentation of phonemes in speech
utterances is as follows:
      (1) Word boundary detection - The input utterance is
extracted from running speech by detecting a word's starting and
ending positions by means of a two-level threshold method using the
log power of  the input speech signal.
      (2) Rough segmentation (DP-based alignment) - The word is
divided into phonemic segments by conductin...