# Method for Computing the Conditional Distribution of a Word Given the Previous Word in Text

Original Publication Date: 1994-Apr-01

Included in the Prior Art Database: 2005-Mar-26

## Publishing Venue

IBM

## Related People

Brown, PF: AUTHOR [+4]

## Abstract

Disclosed is a method for estimating the conditional probability of a word in text given the previous word in the text. Essen and Steinbiss [*], describe a method for expressing the conditional probability distribution of a word given the preceeding word in text as the linear combination of a set of conditional probability distributions. The method described here is an improvement of their scheme.

**This text was extracted from an ASCII text file.**

**This is the abbreviated version, containing approximately 52% of the total text.**

Method for Computing the Conditional Distribution of a
Word Given

the Previous Word in Text

Disclosed is
a method for estimating the conditional

probability of a word in text given the previous word in the text.

Essen and Steinbiss [*], describe a method for expressing the

conditional probability distribution of a word given the preceeding

word in text as the linear combination of a set of conditional

probability distributions. The method
described here is an

improvement of their scheme.

Let T be a
sequence of words from a vocabulary V of size v.

Let c(w sub 1 w sub 2) be the number of
times that the pair of words

w sub 1 w sub 2 occur in sequence in T,
and let c(w sub 1 .) be the

number of times that w sub 1 occurs at
the beginning of a sequence

of two words in T. Similarly, let
c(. w sub 2) be the number of

time that w sub 2 at the end of a sequence of two words in T, and let

c(..) be the number of two word sequences in T.
Let S be a set of

distribution indices, 1, 2, s. Let P sub
sigma (w), sigma memberof

S, be a set of probability distributions over the words in the

vocabulary, and let C sub w (sigma), w memberof V, be a set of

probability distributions over the indices in S. Then a conditional

probability distribution of w sub 2 given
w sub 1 is given by

P(w sub 2 | w sub 1 ) identical sum from <sigma
memberof S> C sub

<w

sub 1> (sigma) P sub
sigma (w sub 2 ).

According to
the method described herein, the distributions P

sub sigma (w) and C sub w (sigma) for s=2 are chosen as follows.

1. Set n=0.

2. Set P sub 1
sup <(0)> (w)=c(. w)/c(. .), and P sub 2 sup

<(0)> (w)=v sup <-1>.

3. Set C sub w
sup <(0)> (1)=0.5 + i sub w epsilon, C sub w sup

<(0)> (2) = 0.5 - i sub w
epsilon, where i sub w is chosen

randomly to be +1 for approximately
half of the words in V and -1

for the remainder of the words in V,
and epsilon is some suitable

small number, say 0.1.

4. Determine J
sup <(n)> (w sub 1 sigma w sub 2 ) according to the

formula

J sup
<(n)> (w sub 1 sigma w sub 2 ) =
<C sub <w sub 1> sup

<(n)>

(sigma) P sub sigma sup
<(n)> (w sub 2 )> left lbracket

<sum from <sigma memberof
S> of <> C sub <w sub 1> sup <(n)>

(sigma) P sub

sigma sup <(n)> (w sub 2
)> right rbracket sup <-1>.

5. Determine N
sup <(n)> (w sub 1 sigma w sub 2 ) according to the

formula

N sup <(n)> (w sub 1 sigma w sub 2 ) = c(w sub...