Browse Prior Art Database

Trainable Nonparametric Language Model

IP.com Disclosure Number: IPCOM000100867D
Original Publication Date: 1990-Jun-01
Included in the Prior Art Database: 2005-Mar-16
Document File: 2 page(s) / 70K

Publishing Venue

IBM

Related People

Burshtein, D: AUTHOR [+3]

Abstract

A language model and training algorithm are given. A metric used incorporates general linguistic facts while the training algorithm extracts specific linguistic information from data.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Trainable Nonparametric Language Model

       A language model and training algorithm are given.  A
metric used incorporates general linguistic facts while the training
algorithm extracts specific linguistic information from data.

      In all databases of interest long strings of bits (or words)
are unique, and hence it is impossible to learn the probability of a
string from its relative frequency of occurrence.  On the other hand,
there is "side information" about language in the form of syntax and
semantics.  For example, the present trigram model uses (albeit in a
crude way) the fact that words that occurred long ago are less
important for predicting the next word than words that occurred
recently.  This invention has three parts.

      Part 1 treats the problem of unique strings by defining a
probability model wherein the string distribution has a probability
density which one can hope to estimate from data.

      Part 2 shows how to include the "side information" in the
model.

      Part 3 shows how to train the model using a sample of strings.

                            (Image Omitted)

      Part 1. The Model.  For simplicity we consider a bitstring
language, but our invention applies to any size alphabet.  Let x   X,
where X = {0, 1}B .  Let p:X v X T R be any metric
on X.  The diameter of a set of strings A is then just D(A) =
sup p(x,y).  For any positive number a the
   x,y  A
function t:2x T R, t(A) = (D(A))a is a generalized volume that
determines a measure m:Q T R defined for subsets A  Q, the smallest
sigma algebra containing the open sets in X defined by the metric p.
Now, let P be any probability measure on X which vanishes whenever m
does.  Then, by the Radon-Nikodym theorem there is a probability
density p with respec...