Browse Prior Art Database

Determining Speaker-Dependent Phonetic Baseforms

IP.com Disclosure Number: IPCOM000120607D
Original Publication Date: 1991-May-01
Included in the Prior Art Database: 2005-Apr-02
Document File: 3 page(s) / 128K

Publishing Venue

IBM

Related People

Bahl, L: AUTHOR [+7]

Abstract

Disclosed is a best-first search -1- algorithm for determining speaker- dependent phonetic baseforms given the spelling of the word, and one utterance. The search is guided by a weighted sum between the acoustic score -2,3,4- and spelling-to-sound rule score -5,6- of the baseform indicated by the current node in the search tree.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 51% of the total text.

Determining Speaker-Dependent Phonetic Baseforms

      Disclosed is a best-first search -1- algorithm for
determining speaker- dependent phonetic baseforms given the spelling
of the word, and one utterance.  The search is guided by a weighted
sum between the acoustic score -2,3,4- and spelling-to-sound rule
score -5,6- of the baseform indicated by the current node in the
search tree.

      The search is conducted over the space of 126 "letter-outputs".
That is, each letter can have 126 different sequences of phones for
it, based on the context in which the letter is used.  At each
iteration of the search, the active node with the highest score is
selected, its 126 successors generated, each of these assigned a
score, and then they are added to the search tree.

      Thus, each level in the search tree corresponds to a letter in
the word whose baseform is desired.  When a node is expanded, the
level is inspected to find the letter in the word to which this node
corresponds.  The spelling-to-sound rule algorithm is called to
return a probability distribution over the 126 different letter
outputs.  Each of these letter outputs has a different sequence of
phonetic phones associated with it.  The phones for each letter
output are appended to the phonetic phones corresponding to the
letter outputs for the ancestor nodes in the search tree, and
evaluated using a Viterbi or Maximum Likelihood forward pass -4-.  If
the forward pass fails because the phones do not agree well with what
was said, then the path is pruned from the search.  A weighted sum
between the acoustic score and the spelling-to-sound rule score is
used to evaluate the overall score for a node.  It is this overall
score, that is placed in a single priority queue -7-, that guides the
best-first search.

      Several enhancements are made to the standard best-first search
algorithm in order to speed the computation, and improve the quality
of the results.

      1.   A backtracking best-first search -8- is used so that only
promising descendants of the 126 hypothesized letter outputs are
actually added to the frontier of the search.  This significantly
saves space, but also time since a significant amount of bookkeeping
code is involved in allocating a node and inserting it into the tree.

      2.   However, all 126 children nodes of the root, and the best
child node for each of these children are always added to the
frontier.  This forces the acoustic component of the score to
evaluate at least several phonetic phones, thus more accurately
computing the scores of these nodes.

      3.   Since a speech processor adapter -9- is used on a PC,
rather than calling the SP hardware to evaluate the score of each 126
possible descendants individually, a structure is built for all 126
nodes and sent in a single call to the SP for evaluation.

      4.   Glottal stops are used to improve the qualify of acronyms
-6-.  Similarly, TPHON rules...