Browse Prior Art Database

Automatic Speech Segment Boundary Detection Using Markov Models

IP.com Disclosure Number: IPCOM000102657D
Original Publication Date: 1990-Dec-01
Included in the Prior Art Database: 2005-Mar-17
Document File: 2 page(s) / 55K

Publishing Venue

IBM

Related People

De Gennaro, S: AUTHOR [+2]

Abstract

Disclosed is a method of automatically determining the start and end times of isolated speech segments using probabilistic models. This technique can be used to segment words and phrases for addition to the active vocabulary of a speech recognizer.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 61% of the total text.

Automatic Speech Segment Boundary Detection Using Markov Models

       Disclosed is a method of automatically determining the
start and end times of isolated speech segments using probabilistic
models.  This technique can be used to segment words and phrases for
addition to the active vocabulary of a speech recognizer.

      In a Markov-model-based speech recognizer, words can be
represented as strings of finite-state machines, corresponding to
phonetic units, and to interword silence intervals.  This
segmentation procedure uses the same silence phone model and
statistics used during recognition to determine the most likely
starting and ending points of an utterance (1,2,3).

      The segmentation procedure requires that the desired utterance
be bracketed by both an initial and final silence interval.  Further,
it requires that some time Ti is known to be within the initial
silence interval, and that some time Tf is known to be within the
final silence interval.

      The procedure consists of:
      1. Performing an acoustic match forward in time from Ti, using
a start distribution of an impulse at time Ti.  The peak of the
output distribution from this match defines Ts, the most likely start
time of the utterance.
      2. Performing an acoustic match backward in time from Tf, using
a start distribution of an impulse at time Tf.  The peak of the
output distribution from this match defines Te, the most likely end
time of the utterance.

 ...