Browse Prior Art Database

Fast Traceback in Viterbi Decoding for Speech Recognition

IP.com Disclosure Number: IPCOM000112649D
Original Publication Date: 1994-Jun-01
Included in the Prior Art Database: 2005-Mar-27
Document File: 2 page(s) / 70K

Publishing Venue

IBM

Related People

Bahl, LR: AUTHOR [+5]

Abstract

Disclosed is a modification of the Viterbi algorithm that allows faster traceback to extract information once a word is recognized.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Fast Traceback in Viterbi Decoding for Speech Recognition

      Disclosed is a modification of the Viterbi algorithm that
allows faster traceback to extract information once a word is
recognized.

Markov Model State Structure - Most isolated word speech recognition
tasks can be graphically represented by a finite state graph at the
word level.  An example of this is shown in the Figure below.

                     ________
                    ___|        |___
                  |   | WORD_1 |   |
     _________    |   |________|   |    ________
    |         |   |                |   |        |
O---| SIL_1   |---O LM2        LM3 O---| SIL_2  |---O
LM1 |_________|   |    ________    |   |________|   LM4
                  |   |        |   |
                  |___| WORD_2 |___|
                      |________|

      In this example, starting in the initial state LM1, there is a
model for silence SIL_1, then either WORD_1 or WORD_2, followed by
another silence SIL_2, ending up in the final state LM4.  This
particular graph would be used if the task was to recognize WORD_1 or
WORD_2 spoken in isolation.  This graph is called the language model
graph, and the states LM1, LM2, LM3 and LM4 the language model
states.

      Each of the 4 "words", SIL_1, WORD_1, WORD_2 and SIL_2 can in
turn be represented by a Markov model with its own set of states and
transitions [*].

      Replacing each "word" in the Figure by a Markov model yields a
composite finite state graph with a high number of states -- the four
language model states labeled LM1, LM2, LM3 and LM4 in the Figure and
all the states in each of the "word" models, or "acoustic states".

Viterbi Recognition Algorithm - The Viterbi decoding algorithm [*]
finds the highest scoring path through the composite graph, given a
spoken utterance characterized by an acoustic label sequence.  The
Viterbi calculation proceeds time-synchronously; i.e., the Viterbi
score...