Browse Prior Art Database

A CONSTRUCTIVE METHOD FOR GRAMMATICAL INFERENCE BASED ON CLUSTERING

IP.com Disclosure Number: IPCOM000148995D
Original Publication Date: 1977-Apr-30
Included in the Prior Art Database: 2007-Mar-30
Document File: 52 page(s) / 2M

Publishing Venue

Software Patent Institute

Related People

Liou, Jiunn-I: AUTHOR [+3]

Abstract

A CONSTRUCTIVE METHOD FOR GRAMMATICAL INFERENCE BASED ON CLUSTERING* Research supported in part by National Science Foundation Grant ENG 76-11936 Jiunn-I Liou and Richard Dubes Department of Computer Science Michigan State UniversityEast Lansing, Michigan 48824 April, 1977 A CONSTRUCTIVE METHOD FOR GRAMMATICAL INFERENCE BASED ON CLUSTERING Jiunn-I Liou Richard Dubes Abstract. Given a sample of strings, this paper proposes a procedurefor inferring a finite-state or a context-free p-grammar having minimal complexity whose language contains all sample strings along with strings which are "like" the sample strings. The core of the constructive methodis the analysis of syntactic structure. The given sample is characterizedby a proximity matrix based on the weighted Levenshtein distance between strings. This leads to a labelled minimum spanning tree whose nodes are strings. A clustering algorithm employing syntactic information partitions the sample strings. The resulting clusters emphasize any self-embeddingand recursive structures inherent in the sample. Grammars are inferredfor the clusters and merged, using an "inference tree" with heuristic tac- tics. The resulting candidate p-grammar is evaluated by comparing its language to the sample. One criterion is th~e K-statistic from the Kolmogorov- Smirnoff Maximum deviation test. This constructive method exhibits a reason- able set of a~lternative grammars and avoids the overwhelming computational burden of other methods.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 6% of the total text.

Page 1 of 52

A CONSTRUCTIVE METHOD FOR GRAMMATICAL INFERENCE

BASED ON CLUSTERING*

Jiunn-I Liou and Richard Dubes

Department of Computer Science

  Michigan State University
East Lansing, Michigan
48824

April, 1977

* Research supported in part by National Science Foundation Grant ENG 76-11936

[This page contains 1 picture or other non-text object]

Page 2 of 52

A CONSTRUCTIVE METHOD FOR GRAMMATICAL INFERENCE BASED ON CLUSTERING
Jiunn-I Liou
Richard Dubes

    Abstract. Given a sample of strings, this paper proposes a procedure
for inferring a finite-state or a context-free p-grammar having minimal
complexity whose language contains all sample strings along with strings
which are "like" the sample strings. The core of the constructive method
is the analysis of syntactic structure. The given sample is characterized
by a proximity matrix based on the weighted Levenshtein distance between
strings. This leads to a labelled minimum spanning tree whose nodes are
strings. A clustering algorithm employing syntactic information partitions
the sample strings. The resulting clusters emphasize any self-embedding
and recursive structures inherent in the sample. Grammars are inferred
for the clusters and merged, using an "inference tree" with heuristic tac-
tics. The resulting candidate p-grammar is evaluated by comparing its
language to the sample. One criterion is th~e K-statistic from the Kolmogorov-
Smirnoff Maximum deviation test. This constructive method exhibits a reason-
able set of a~lternative grammars and avoids the overwhelming computational
burden of other methods.

KEY WORDS : Grammatical inference, clustering, syntactic pattern
recognition, minimum spanning tree, learning grammars,
string dissimilarity, probabilistic grammar

[This page contains 1 picture or other non-text object]

Page 3 of 52

FOOTNOTES

Manuscript received

This paper is based on research conducted by Jiunn-I Liou toward a doc-
toral dissertation, which is one requirement for the Ph.D. degree, Department of Computer Science, Michigan State University, East Lansing, Michigan 48824

This work was supported, in part, by NSF Grant ENG76-11936.

    Jiunn-I Liou is currently a member of the Technical Staff a t the Grumman Corp., Bethpage, New York. Richard Dubes is a Professor in the Department
of Computer Science a t Michigan State University.

'~otations necessary to this paper are given in Appendix A.

    2 In general, samples can be positive (members of the language for the grammar to be inĀ£ erred) or negative (not members). Negative samples imply the presence of a "teacher." Only positive samples are considered in this paper.

    3 ~ i o u [42] has suggested several string dissimilarity measures based on WLD for grammatical reference.

    4~onsider a complete graph whose n nodes correspond to the n strings in S and whose (undirected) edges have weights equal to entries in the proximity matrix (Appendix B). A...