Browse Prior Art Database

Method for the Detection of Spoken Utterance Boundaries Based On Cascaded Finite State Machines

IP.com Disclosure Number: IPCOM000100478D
Original Publication Date: 1990-Apr-01
Included in the Prior Art Database: 2005-Mar-15
Document File: 7 page(s) / 310K

Publishing Venue

IBM

Related People

Grice, DG: AUTHOR [+4]

Abstract

A major factor affecting the performance of an isolated word speech recognizer is the ability of the system to correctly and consistently determine the end-points of an input utterance. This aspect of the recognition system is called the end-point detection procedure. Since the recognition pattern of an input word is constructed between the beginning and ending points, misaligned end-points can account for a majority of word-recognition errors. Connected word recognizers eliminate the critical end-point detection process but introduce other problems such as coarticulation effects. An effective end-point detection mechanism is a critical part of an accurate isolated word speech recognizer.

This text was extracted from an ASCII text file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 21% of the total text.

Method for the Detection of Spoken Utterance Boundaries Based On Cascaded

Finite

State

Machines

       A major factor affecting the performance of an isolated
word speech recognizer is the ability of the system to correctly and
consistently determine the end-points of an input utterance.  This
aspect of the recognition system is called the end-point detection
procedure.  Since the recognition pattern of an input word is
constructed between the beginning and ending points, misaligned
end-points can account for a majority of word-recognition errors.
Connected word recognizers eliminate the critical end-point detection
process but introduce other problems such as coarticulation effects.
An effective end-point detection mechanism is a critical part of an
accurate isolated word speech recognizer.

      The described end-point detector exhibits an organized and
flexible structure that can be systematically modified based on
iterative empirical findings.  This is important because the exact
design of a recognition system may not be clearly understood during
development.  Also, the general end-point system can be modified
easily for use in other applications.  The flexible structure allows
for easy modification based on testing results.  The structure
permits simplified customization as a function of language or
environment.  The flexible nature of the end-point decision
parameters is an important characteristic of the function.

      Another important characteristic of this end-point algorithm is
the over-all hierarchical structure which provides several benefits.
This approach uses layers of function to break down the total
problem.  Each layer is seen as a module interested in performing a
specific task. Each module passes information concerning its input
sequence to the next higher layer.  This helps the designer visualize
more clearly the effects of parameter modification on system
performance.

      High Level Structure The end-point detection system consists of
the current state machine (CSM), segment state machine (SSM) and word
state machine (WSM) (Fig. 1).  The CSM is the lowest layer of the
system.  It handles the conversion of time data to energy and zero
crossing metrics which are then quantified as the current input state
(CIS).  It is the CIS that is supplied as input to the SSM.
Information units passed from one finite state machine to another are
referred to as tokens.  The segment state machine, therefore, accepts
the current input state token passed by the current state machine.
The SSM uses the CIS tokens to identify "Word segments".  Word
segments are defined in terms of normalized "energy" streams.  As
segments are identified by the segment state machine, they are passed
as output tokens to the third level.  This layer is called the word
state machine (WSM). The WSM classifies the incoming sequences of
segments. Certain types of segment sequences will be identified as
valid input "words".  T...