Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Estimating Unknown Rule Occurrences in an Incomplete Grammar Of a Natural Language

IP.com Disclosure Number: IPCOM000119651D
Original Publication Date: 1991-Feb-01
Included in the Prior Art Database: 2005-Apr-02
Document File: 1 page(s) / 35K

Publishing Venue

IBM

Related People

Sharman, RA: AUTHOR

Abstract

No grammar of a Natural Language can ever be complete because of the infinite nature of true natural languages. What is disclosed is a way of approximating the true grammar sufficiently well.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 90% of the total text.

Estimating Unknown Rule Occurrences in an Incomplete Grammar Of a
Natural Language

      No grammar of a Natural Language can ever be complete
because of the infinite nature of true natural languages. What is
disclosed is a way of approximating the true grammar sufficiently
well.

      Assume that a Context-Free-Phrase Structure Grammar (CF-PSG) of
a natural language has been obtained (see the following article). The
rules will be of the form A->B C D, where A is the left-hand side
non-terminal, and B C D constitutes a right-hand side of
non-terminals and terminals.  An infinite number of low frequency
rules is omitted by this grammar.  The following method is one way of
estimating additional rules while still keeping the grammar size
finite .
1.   Convert the grammar to Chomsky-Normal-Form (CNF).  This yields
A->B X  X->C D for the example above (note Singleton rules), which is
a well known procedure.
2.   Completion: For every left-hand-side category A, add extra
right- hand-side rules, such that for every combination of symbols Y,
Z which has not been observed, A->Y Z.
3.   Assign a probability to each new rule such that the sum of the
probabilities of all these rules for each non-terminal is still=1,
and such that the new set of rules constitutes a pareto distribution
- as typically observed for natural grammars.

      The resulting grammar has, by definition, no missing rules, and
is therefore "complete".  It can be used as a regular CF-PSG by...