Browse Prior Art Database

Probabilistic Model of Link Grammar

IP.com Disclosure Number: IPCOM000111294D
Original Publication Date: 1994-Feb-01
Included in the Prior Art Database: 2005-Mar-26
Document File: 8 page(s) / 297K

Publishing Venue

IBM

Related People

Lafferty, JD: AUTHOR

Abstract

This article discloses a new class of language models. The motivation of this class is to use grammar to reduce the relative entropy of natural language. The method derives from link grammar, a context-free formalism for the description of natural language. This article discloses an algorithm for determining maximum-likelihood estimates for the parameters of this class of models. The resulting language models differ from previous models based on stochastic context-free grammars in that they are highly lexical. In particular, they include the familiar n-gram models as a natural subclass.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 15% of the total text.

Probabilistic Model of Link Grammar

      This article discloses a new class of language models.  The
motivation of this class is to use grammar to reduce the relative
entropy of natural language.  The method derives from link grammar, a
context-free formalism for the description of natural language.  This
article discloses an algorithm for determining maximum-likelihood
estimates for the parameters of this class of models.  The resulting
language models differ from previous models based on stochastic
context-free grammars in that they are highly lexical.  In
particular, they include the familiar n-gram models as a natural
subclass.

      Motivation - Finite-state methods occupy a special position in
the realm of probabilistic models of natural language.  In
particular, the simplicity of the trigram model renders it especially
well-suited to parameter estimation over hundreds of millions of
words of data, resulting in models whose predictive powers have yet
to be seriously contested.  It has only been through variations on
the finite-state theme, as realized in cached models, for example,
that significant improvements have been made.

      In the most common probabilistic model of context-free phrase
structure grammar, the parameters are the probabilities P ( A RARROW
B C LOR A )  and  P ( A RARROW w LOR A ) , where A, B, and C are
nonterminals and w is a terminal symbol.  For natural language,
experience has shown that this model only weakly captures contextual
dependencies, even if the set of nonterminals is sufficiently rich to
encode lexical information.  More to the point, the cross-entropies
of language models constructed from probabilistic grammars have so
far been well above the cross-entropies of trigram language models.

      Link grammar is a new context-free formalism for natural
language proposed in [1].  What distinguishes this formalism from
many other context-free models is the absence of explicit
constituents, as well as a high degree of lexicalization.  It is this
latter property which makes link grammar attractive from the point of
view of probabilistic modeling.

      Several grammatical formalisms besides link grammar have been
proposed which are highly lexical.  One such example is lexicalized
tree-adjoining grammar, which is in fact weakly context-sensitive in
generative power.  While this formalism is promising for statistical
language modeling, the relative inefficiency of the training
algorithms limits the scope of the associated models.  In contrast,
the motivation behind constructing a probabilistic model for link
grammar lies in the fact that it is a very simple formalism, for
which there exists an efficient parsing algorithm.  This suggests
that the parameters of a highly lexical model for link grammar might
be estimated on very large amounts of text, resulting in powerful
models of language.

      Link Grammar - The basics of link grammar are described in [1].
Brief...