Browse Prior Art Database

Procedure for Partitioning a Vocabulary Into Subsets of Semantically Related Words

IP.com Disclosure Number: IPCOM000101061D
Original Publication Date: 1990-Jun-01
Included in the Prior Art Database: 2005-Mar-16
Document File: 3 page(s) / 91K

Publishing Venue

IBM

Related People

Bahl, L: AUTHOR [+4]

Abstract

In the language model component of a natural-language speech recognition system we wish to predict what word will be spoken next from the words already spoken, i.e., we wish to estimate (Image Omitted) where wn denotes the Nth word spoken. We may express Equation (1) in terms of semantic classes as: where s denotes the (unique) semantic class of w.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Procedure for Partitioning a Vocabulary Into Subsets of Semantically Related Words

       In the language model component of a natural-language
speech recognition system we wish to predict what word will be spoken
next from the words already spoken, i.e., we wish to estimate

                            (Image Omitted)

 where wn
denotes the Nth word spoken. We may express Equation (1) in terms of
semantic classes as: where s denotes the (unique) semantic class of
w.

      Although Equation (2) appears to be more complex than the
orignial expression Equation (1), its appeal lies in the fact the two
terms on the right hand side can be well approximated by simple
expressions.  However, in order to be able to capitalize on these
approximations and simplifications, it is necessary, first of all, to
determine suitable semantic classes.

      The following invention describes how to partition a vocabulary
of words into non-overlapping semantic classes suitable for use in
Equation (2).  The method requires a large body of natural language
text (called the training text) which we will assume is available.
For a 5,000 word vocabulary 10,000,000 - 100,000,000 words is a good
amount; the more, the better.
Define the semantic "stickiness" of words w  and w  as where Pr(w )
denotes the probability that a randomly selected word from the
training text will be w , and Pr(w  ¯ w ) denotes the probability
that a randomly selected word from a window centered on w  will be w
.  A typical window would contain about 500 words on each side of w
but would exclude the 2 words immediately adjacent to w  on each
side. These 4 neighboring words are excluded to avoid contaminating
the semantic associations with syntactic associations.  Thus, the
window would cover about 1,000 words in total, but it would have a
5-word gap in the middle.

      The probabilities in Equation (3) can be estimated from word
counts obtained from the training text.
      Observe that S(i,j) = S(j,i) by Bayes' rule.

      In keeping with the definition of Equation (3) we m...