Browse Prior Art Database
Quick Links

# Conditional Maximum Entropy Models

IP.com Disclosure Number: IPCOM000113890D
Original Publication Date: 1994-Oct-01
Included in the Prior Art Database: 2005-Mar-27
Document File: 4 page(s) / 50K

IBM

## Related People

Brown, PF: AUTHOR [+3]

## Abstract

The disclosed invention describes a solution to the following problem: Let Y and X sub 1, X sub 2 ... be random variables. Given n predictors for Y . in terms of conditional probabilities q sub i(y | x sub i), i = 1, 2, ..., n for Y given X sub i, find a single predictor p(y| x sub 1, x sub 2, ..., x sub n) We do not assume we know the joint distributions q sub i(y , x sub i)

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 53% of the total text.

Conditional Maximum Entropy Models

The disclosed invention describes a solution to the following
problem:
Let Y  and X sub 1, X sub 2 ... be random variables.  Given n
predictors for Y . in terms of conditional probabilities
q sub i(y | x sub i), i = 1, 2, ..., n for Y given X sub i, find a
single
predictor p(y| x sub 1, x sub 2, ..., x sub n)  We do not assume we
know
the joint distributions q sub i(y , x sub i)

An example of such a setup in the field of natural language
processing is:
Given:
1.  A predictor of a part-of-speech-tag from the class of a word
2.  A predictor of a part-of-speech-tag from the spelling of a
word
find a predictor of a tag using both the class and the spelling
of a word.

We propose a maximum entropy aproach to the problem.  We assume we
have an initial guess r(x sub 1, ..., x sub n, y) for
p(x sub 1, ..., x sub n, y) .  The maximum entropy philosophy
advocates the following:
Choose the unique   p  satisfying the constraints
p(y | x sub i) = q sub i(y | x sub i) that is closest to r in
the Kullback distance:
p = argmin sub < C suchthat t > D(t| | r)
where C denotes the constraint set, and D is the Kullback
distance
D(p || r) = sum from z of p(z) log  p(z) over < r(z) >

Here z = (x sub 1, ..., x sub n, y)  runs over all possible values
of for (x sub 1, ..., x sub n, y).  Note that if r is uniform then
D(p||r) = - H(p) + constant where   H(p)  is the entropy of p.

In the remainder, we assume that the initial guess r is of the form
r(x sub 1, ..., x sub n,y) = product from i of r(x sub i |y)r(y)

It is then possible to show that the...