Browse Prior Art Database

Determining Word Boundaries in an Utterance Consisting of N Repetitions of the Same Word

IP.com Disclosure Number: IPCOM000113233D
Original Publication Date: 1994-Jul-01
Included in the Prior Art Database: 2005-Mar-27
Document File: 2 page(s) / 55K

Publishing Venue

IBM

Related People

Epstein, M: AUTHOR

Abstract

In order to build good baseforms for words not in a speech recognizer's vocabulary, 3-4 pronunciations should be given. If these pronunciations are said multiply in a single utterance, a reliable way of locating the beginning and end times of the words is needed.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 54% of the total text.

Determining Word Boundaries in an Utterance Consisting of N Repetitions
of the Same Word

      In order to build good baseforms for words not in a speech
recognizer's vocabulary, 3-4 pronunciations should be given.  If
these pronunciations are said multiply in a single utterance, a
reliable way of locating the beginning and end times of the words is
needed.

      The Tangora currently uses a combination of a silence matcher
and mumble matcher to locate the boundaries of words in an utterance.
This frequently makes errors, since noise is interpretted as speech
by the mumble matcher, and interword silences are sometimes
incorrectly interpretted as intraword silences.  This is currently
handled using heuristics to eliminate noise and inter-word silences.
However, one can do much better when the words in the utterance are
all the same.  Exploiting this fact, one can discover much more
reliable boundaries.

      The current invention proposes that the word be said 3-4 times,
with pauses in between, in one utterance.  The current baseform
building algorithm uses the Tangora fast match, which uses a silence
phone at the end of each baseform [1].  This invention proposes using
the silence matcher at the start of the utterance to find where the
first pronunciation begins.  Then, since the silence phone is at the
end of the baseform that is built, one can build a baseform for the
first pronunciation only, even though more than one pronunciation is
given.  Having...