Browse Prior Art Database

Supervised Tying of Speaker-Independent Multonic Markov Word Models

IP.com Disclosure Number: IPCOM000106713D
Original Publication Date: 1993-Dec-01
Included in the Prior Art Database: 2005-Mar-21
Document File: 2 page(s) / 112K

Publishing Venue

IBM

Related People

Bahl, LR: AUTHOR [+4]

Abstract

Multonic hidden Markov word models have proven very useful for the acoustic representation of words in both isolated and connected utterance speech recognition systems. This article proposes a fast algorithm to incorporate supervised tying into the derivation of multonic sub-word models. This approach leads to the efficient growing of a large set of high quality acoustic word templates having an improved capability of modelling variations in pronunciation.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 54% of the total text.

Supervised Tying of Speaker-Independent Multonic Markov Word Models

      Multonic hidden Markov word models have proven very useful for
the acoustic representation of words in both isolated and connected
utterance speech recognition systems.  This article proposes a fast
algorithm to incorporate supervised tying into the derivation of
multonic sub-word models.  This approach leads to the efficient
growing of a large set of high quality acoustic word templates having
an improved capability of modelling variations in pronunciation.

      A pilot experiment was run on a 20,000 word vocabulary,
isolated utterance, office correspondence task.  A
speaker-independent clustering tree using three speakers was built,
representing a total of about 10,000 sentences of training data, and
used this tree to derive a set of speaker-independent multonic
baseforms as detailed above.  Two hundred sentences of training data
of a fourth speaker whose data was not used to build the multonic
baseforms was labelled and this data was used to train each sub-word
Markov model on these multonic baseforms.  Finally, the associated
test set was decoded, which consisted of 100 sentences containing a
total of 1667 words.

      The results obtained using the method described above with
those obtained using the regular procedure of [4]  involving standard
fenonic baseforms was compared.  The number of errors observed in the
first case was reduced by about 10% as compared to the number...