Browse Prior Art Database

Tree Based Smoothing Algorithm for a Trigram Language Speech Recognition Model

IP.com Disclosure Number: IPCOM000122669D
Original Publication Date: 1991-Dec-01
Included in the Prior Art Database: 2005-Apr-04
Document File: 4 page(s) / 145K

Publishing Venue

IBM

Related People

Bahl, LR: AUTHOR [+4]

Abstract

A technique is described whereby a tree-based smoothing algorithm provides a means of estimating trigram probabilities. It is an improvement over previous methods in that it leads to a lower perplexity in a natural language speech recognition system.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 43% of the total text.

Tree Based Smoothing Algorithm for a Trigram Language Speech Recognition
Model

      A technique is described whereby a tree-based smoothing
algorithm provides a means of estimating trigram probabilities.  It
is an improvement over previous methods in that it leads to a lower
perplexity in a natural language speech recognition system.

      In speech recognition systems, which incorporate language
models (1,2), an important concern is the estimation of so-called
m-gram probabilities, particularly 3-gram word probabilities.  The
principle difficulty in estimating m-gram probabilities is the number
of m-grams involved, which grows exponentially with m.  For example,
in a vocabulary of 5000 words, there can exist 125 billion distinct
word 3-grams.

      In word 3-gram probabilities, approximations are typically made
by weighted linear combinations of 1-gram, 2-gram and 3-gram relative
frequencies.  The weights vary according to the reliability of the
individual components. Frequent 3-gram estimations (3,4) are,
typically, made by their unadjusted relative frequency.  Infrequent,
but observed, 3-grams are estimated by discounting their relative
frequency so as to compensate for an upward bias. Unobserved 3-grams
are estimated by distributing the discounted probability over the
unobserved 3-grams in proportion to their 2-gram probability.  The
2-gram probability, in turn, is estimated in a similar manner,
backing-off where necessary, to the 1-gram probability.

      Since most possible 3-grams do not occur in the limited
training data available for probability estimation, it is inevitable,
in both approaches, that many 3-gram probabilities are estimated
principally from 2-gram or 1-gram statistics.  Therefore, both
smoothing methods are vulnerable to the quantum leap and the
consequent drastic loss of information that occurs when going from
3-gram to 2-gram statistics and from 2-gram to 1-gram.  It is,
therefore, considered advantageous if a method provided a back-off in
smaller steps, so as to minimize the loss of resolution and
subsequent information.

      The concept described herein provides a backing-off method,
from 3-gram statistics to the leaf of a tree-based language model.
The advantage of the tree being that the resolution at the leaves is
considered much better than 2-gram.  The resolution at the root is
the same as 1-gram. At the nodes between leaf and root many
intermediate levels of resolution can be obtained.  Therefore,
smoothing can progress in small steps, backing-up the tree until
sufficiently high frequencies are obtained.

      It is assumed that 3-gram frequencies have been extracted from
some training data, and that a language-model binary-tree based
idiot-system has been constructed (5,6). Also, that some held-out
data is available for computing the smoothing parameters.  The
held-out data must be independent of the data used to obtain the
3-gram counts and the idiot-system.

   ...