Browse Prior Art Database

Phonemic Segmentation Method Based on the Dynamic Feature of Fundamental Frequency Patterns

IP.com Disclosure Number: IPCOM000123001D
Original Publication Date: 1998-Mar-01
Included in the Prior Art Database: 2005-Apr-04
Document File: 2 page(s) / 97K

Publishing Venue

IBM

Related People

Saito, T: AUTHOR

Abstract

Disclosed is a method for automatically dividing speech utterances into phonemic segments, which are used as basic elements for constructing synthesis unit inventories in rule-based speech synthesis systems. In this method, a new segmentation parameter called "dynamics of fundamental frequency pattern" is proposed. This parameter is powerful for segmenting voiced phonemes.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 53% of the total text.

Phonemic Segmentation Method Based on the Dynamic Feature of Fundamental
Frequency Patterns

      Disclosed is a method for automatically dividing speech
utterances into phonemic segments, which are used as basic elements
for constructing synthesis unit inventories in rule-based speech
synthesis systems.  In this method, a new segmentation parameter
called "dynamics  of fundamental frequency pattern" is proposed.
This parameter is powerful for segmenting voiced phonemes.

      Derivation of the Dynamics of Fundamental Frequency Pattern

      First, obtain GCI (Glottal Closure Instant) parameters by
wavelet analysis of a given utterance.  Next, obtain F0 (fundamental
frequency) pattern (= 1/T0) by taking the interval between adjacent
GCIs as T0 (pitch period).  Smoothed logarithmic F0 value, SF0(i), is
then obtained for each fixed-length segment, which is called a frame,
by calculating a mean value of log(F0) in a frame.  ("i" denotes the
frame number.)

      Using SF0(i), the dynamics of F0 pattern, DF(j),is obtained
by solving Equation 1 for each frame.  In the Equation 1,
     W(j): weighting function (symmetric window assumed, i.e.,
      W(j)=W(-j) )
     2*M+1: window length (frames)
     K: length of analysis frame (samples)
     SF0av(j) is shown in Equation 2.

      Procedure for Automatic Segmentation

      The procedure for automatic segmentation of phonemes in
speech utterances is as follows:
  1.  Word boundary detection The input utterance is extracted
       from running speech by detecting a word's starting and
       ending positions by means of a two-level threshold method
       using the log power of the input speech signal.
  2.  Rough segmentation The word is divided into phonemic
       segments by conducting DP matching with the reference
       speaker's speech data for the same word.  The phonemic
       boundaries of the...