Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Automatic Determination of Pronunciation of Words From Their Spellings

IP.com Disclosure Number: IPCOM000100123D
Original Publication Date: 1990-Mar-01
Included in the Prior Art Database: 2005-Mar-15
Document File: 5 page(s) / 198K

Publishing Venue

IBM

Related People

Bahl, LR: AUTHOR [+4]

Abstract

A technique is described whereby speech recognition devices can automatically determine pronunciation of words from their spellings. When a device is given the spelling of a previously unseen word, the concept determines the word's possible pronunciations and the probabilities thereof.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 31% of the total text.

Automatic Determination of Pronunciation of Words From Their Spellings

       A technique is described whereby speech recognition
devices can automatically determine pronunciation of words from their
spellings. When a device is given the spelling of a previously unseen
word, the concept determines the word's possible pronunciations and
the probabilities thereof.

      In speech recognition systems which employ Markov word models,
two types of word models (phonetic and fenemic) are prominent.
Phonetic word models, also known as phonetic baseforms, are usually
obtained from phonetic transcriptions as listed in a dictionary, or
as provided by a phonetician. Fenemic word models, also known as
fenemic baseforms, can be obtained automatically from multiple
utterances.

      Generally, when users personalize the recognition vocabulary by
adding their own words, it is desirable to obtain both fenemic and
phonetic baseforms for the added words.  While the fenemic baseforms
can be determined readily from one or more utterances of the added
words, the phonetic baseforms cannot.

      Although attempts to determine phonetic baseforms automatically
from one or more utterances have been attempted previously, the
success has been limited.  The methods have attempted to create
phonetic baseforms solely from the acoustic evidence provided by the
utterances. However, an additional source of information is provided
by the spelling of a word, which, if used in conjunction with the
acoustic information, offers the potential of a significant
improvement in the quality of automatic phonetic baseforms.

      The concept described herein utilizes the given spelling of a
word to provide a means of determining the possible baseforms of a
word, together with their prior probabilities.  The concept may,
therefore, serve as the language model component of a speech
recognition system whose task is to decode phoneme sequences (i.e., a
phonetic baseform) from one or more feneme sequences (i.e., one or
more utterances of the subject word).

      Attempts to determine spelling to phoneme rules automatically
have been previously made (*).  However, the concept described herein
employs the techniques of question-asking pylonic idiot-systems and
differs from (*); a) in the manner in which the questions are
determined, b) the manner in which tree-growing terminates, and c)
the method of computing the probability distribution at the tree
leaves.

      In early applications of pylonic idiot-systems, the training
data was divided into three parts: a) "construction" data used for
question construction; b) "checking" data used for question
verification and tree termination; and c) "held-out" data used for
estimating the probability distribution at the tree leaves.  A
possible criticism of this early strategy is that it inefficiently
uses the training data.  This is a potentially serious drawback if
there is relatively little training data avail...