Browse Prior Art Database

Conditional Maximum Entropy Models

IP.com Disclosure Number: IPCOM000113890D
Original Publication Date: 1994-Oct-01
Included in the Prior Art Database: 2005-Mar-27
Document File: 4 page(s) / 50K

Publishing Venue

IBM

Related People

Brown, PF: AUTHOR [+4]

Abstract

The disclosed invention describes a solution to the following problem: Let Y and X sub 1, X sub 2 ... be random variables. Given n predictors for Y . in terms of conditional probabilities q sub i(y | x sub i), i = 1, 2, ..., n for Y given X sub i, find a single predictor p(y| x sub 1, x sub 2, ..., x sub n) We do not assume we know the joint distributions q sub i(y , x sub i)

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 53% of the total text.

Conditional Maximum Entropy Models

      The disclosed invention describes a solution to the following
problem:
  Let Y  and X sub 1, X sub 2 ... be random variables.  Given n
predictors for Y . in terms of conditional probabilities
q sub i(y | x sub i), i = 1, 2, ..., n for Y given X sub i, find a
single
predictor p(y| x sub 1, x sub 2, ..., x sub n)  We do not assume we
know
the joint distributions q sub i(y , x sub i)

An example of such a setup in the field of natural language
processing is:
  Given:
  1.  A predictor of a part-of-speech-tag from the class of a word
  2.  A predictor of a part-of-speech-tag from the spelling of a
      word
  find a predictor of a tag using both the class and the spelling
  of a word.

We propose a maximum entropy aproach to the problem.  We assume we
have an initial guess r(x sub 1, ..., x sub n, y) for
p(x sub 1, ..., x sub n, y) .  The maximum entropy philosophy
advocates the following:
  Choose the unique   p  satisfying the constraints
  p(y | x sub i) = q sub i(y | x sub i) that is closest to r in
  the Kullback distance:
  p = argmin sub < C suchthat t > D(t| | r)
  where C denotes the constraint set, and D is the Kullback
  distance
  D(p || r) = sum from z of p(z) log  p(z) over < r(z) >

Here z = (x sub 1, ..., x sub n, y)  runs over all possible values
of for (x sub 1, ..., x sub n, y).  Note that if r is uniform then
D(p||r) = - H(p) + constant where   H(p)  is the entropy of p.

In the remainder, we assume that the initial guess r is of the form
  r(x sub 1, ..., x sub n,y) = product from i of r(x sub i |y)r(y)

It is then possible to show that the...