Browse Prior Art Database

Context Sensitive Fertility Probabilities

IP.com Disclosure Number: IPCOM000111618D
Original Publication Date: 1994-Mar-01
Included in the Prior Art Database: 2005-Mar-26
Document File: 2 page(s) / 78K

Publishing Venue

IBM

Related People

Brown, PF: AUTHOR [+4]

Abstract

Disclosed is a method for constructing a model of the fertility of source language words in a machine translation system that takes account of the context in which the words appear. According to the methods known in prior art, the fertility of a source word in a translation system is the number of target words that it generates. Further, according to the methods known in the art, this number is a random variable that depends only on the source word. This invention addresses the problem of extending this conception of the fertility to a random variable that depends both on the source word and on the context in which that source word appears.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 51% of the total text.

Context Sensitive Fertility Probabilities

      Disclosed is a method for constructing a model of the fertility
of source language words in a machine translation system that takes
account of the context in which the words appear.  According to the
methods known in prior art, the fertility of a source word in a
translation system is the number of target words that it generates.
Further, according to the methods known in the art, this number is a
random variable that depends only on the source word.  This invention
addresses the problem of extending this conception of the fertility
to a random variable that depends both on the source word and on the
context in which that source word appears.

      Let (E sub 1, F sub 1), (E sub 2, F sub 2),  ellipsis , (E sub
t, F sub t), be a collection of pairs of source sentences, E sub i,
and target sentences, F sub i, such that paired sentences are
translations of one another.  Let l sub i be the number of words in E
sub i.  Let e sup <(i)> sub <j> be the j sup <th> word of E sub i.
If j lt 1 or j gt l sub i, then define e sup <(i)> sub <j> to be the
special boundary source word.  Let A sub i be the most probable
alignment between E sub i and F sub i as obtained according to the
methods described Brown et al.  [*].  Let n sub <ij> be the fertility
of e sup <(i)> sub <j> in A sub i.  Let  delta (w,ij) be 1 if w=e sup
<(i)> sub <j> and 0 otherwise.  Let
 delta (n, phi ) be 1 if n= phi  and 0 otherwise.  Let C sub k(ij) be
the sequence of words e sup <(i)> sub <j-k> e sup <(i)> sub <j-k+1>
ellipsis e sup <(i)> sub <j> ellipsis e sup <(i)> sub <j+k-1> e sup
<(i)> sub <j+k>.  The sequence C sub k(ij) is called the k sup <th>
order context of e sup <(i)> sub <j>.  The context of a word e at an
unspecified position for an unspecified order is denoted by C(e).
The vocabulary of the source, which includes the special boundary
source word, is V.

The method disclosed is as follows.

1.    Determine an initial fertility model for each target word.

2.    Determine a refined fertility model for each target word.

Step 1 of the method is further divided into the following steps:

1.    Obtain a collection of pairs of target and source sentences.

2.    Determine the parameters of Model 5 as described in detail in
    [*]

3.    Using the parameters obtained in the previous step, determine
    word by word alignments for the pairs of target and source
    sentences.

4.    Determine N(w, phi ) according to the following formula
  N(w, phi )= sum from <ij> of n sub <ij>  delta (w,ij)  delta (n sub
<i       j>, phi

5.    Determine N(w) according to the formula

                     N(w)=sum from  phi of N(w, phi )

6.    Determine an initial fertility model, P sup <(0)>( phi |C(w)),
    according to the formula

             P sup <(0)>( phi |C(w))=<N(w, phi )> over <N(w)>

      Step 2 of the method is also divided into a num...