Browse Prior Art Database

Boundary Detection for Addword through Decoding

IP.com Disclosure Number: IPCOM000104056D
Original Publication Date: 1993-Mar-01
Included in the Prior Art Database: 2005-Mar-18
Document File: 2 page(s) / 60K

Publishing Venue

IBM

Related People

De Gennaro, SV: AUTHOR [+4]

Abstract

Disclosed is a method of establishing the start- and end- time boundaries for a new word to be added to a speech recognition system.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Boundary Detection for Addword through Decoding

      Disclosed is a method of establishing the  start- and end- time
boundaries for a new word to be added to a speech recognition system.

      Adding a new word to a speech recognition system  typically has
two components: 1) the creation of an acoustic model, and 2) the
creation of a linguistic model.  Creation of the acoustic model from
one or more sample utterances of the word requires segmentation of
the speech from the surrounding words or background-noise regions by
identifying the start and end times for the corresponding acoustic
sequence.  These start- and end-time boundaries of the word are
generally required by algorithms used to construct the acoustic
models.  This article describes a method of establishing this
segmentation by matching the new word against a list of known words,
and using the boundaries of the best-match word as estimates of the
boundaries for the new word.

These  segments  can be established by matching the new word against
a list of known words:

1.  Find an approximate best-match for the new word, using acoustics
    alone or a combination of acoustics and language model, from the
    set of all known words.

2.  Use the start-time and end-time boundaries of this best-match as
    the true boundaries of the new word.  Typically these boundaries
    are represented as probability distributions over time, with the
    most-likely start and end times for the new word corresponding to
    the peak samples in the corresponding distributions.  For
    isolated speech, there may be two end-time boundaries:

    o   Pre-silence: based on the output distribution of the last
    ...