Browse Prior Art Database

Fenemic Phones and Their Sisters: Topology and Design of Markov Source Models With Minimum Length Constraints for Use in a Speech Recognition System

IP.com Disclosure Number: IPCOM000121429D
Original Publication Date: 1991-Sep-01
Included in the Prior Art Database: 2005-Apr-03
Document File: 2 page(s) / 77K

Publishing Venue

IBM

Related People

Bahl, LR: AUTHOR [+3]

Abstract

In one prominent approach to speech recognition, the pronunciation of a word is represented by a Markov source model which consists of one or more so-called leafforms. Each leafform typically models the pronunciation of one phoneme and consists of a sequence of subword Markov models called fenemic phones. The most common topology of a fenemic phone is a two-state model with 3 arcs emanating from the first state: a non-deletable arc leading back to the first state, a non-deletable arc leading to the second state, and a deletable arc leading to the second state.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Fenemic Phones and Their Sisters: Topology and Design of Markov Source
Models With Minimum Length Constraints for Use in a Speech Recognition
System

      In one prominent approach to speech recognition, the
pronunciation of a word is represented by a Markov source model which
consists of one or more so-called leafforms. Each leafform typically
models the pronunciation of one phoneme and consists of a sequence of
subword Markov models called fenemic phones.  The most common
topology of a fenemic phone is a two-state model with 3 arcs
emanating from the first state:  a non-deletable arc leading back to
the first state, a non-deletable arc leading to the second state, and
a deletable arc leading to the second state.

      Since the probability of traversing the deletable arc is
usually non-zero, it is possible to get from the first state of a
fenemic phone to the last state without producing any outputs.  And
since a leafform is a sequence of these deletable fenemic phones, it
is also possible to get from the first state of a leafform to the
last state without producing any outputs.  Thus, a leafform can
usually be deleted entirely, which also implies that an entire word
can be deleted entirely.

      In real speech however, content words are not generally
deletable, and in many contexts certain phones (like stressed vowels)
are not deletable.  The present invention provides a means of
enforcing minimum length constraints on leafforms and, hence, on
words, so as to bring the models more into line with what is observed
in practice.

      We will assume the existence of a set of leafforms expressed in
terms of an alphabet of two-state three-arc fenemic phones.  The
following steps are performed.
(1)  For each phone in the alphabet of fenemic phones, create a
"sist...