Browse Prior Art Database

NEXT WORD Statistical PREDICTOR in Correspondence

IP.com Disclosure Number: IPCOM000044407D
Original Publication Date: 1984-Dec-01
Included in the Prior Art Database: 2005-Feb-05
Document File: 2 page(s) / 53K

Publishing Venue

IBM

Related People

Bahl, LR: AUTHOR [+4]

Abstract

Constructing a practical statistical predictor, implicitly dependent upon semantic context, exploits semantic clustering inherent in the training text, defining global bigrams and using them for prediction. The predictive power inherent in long term memory in text may be used to exploit whatever semantic clustering (table with chair, etc.) can be found by statistical clustering of training text. Given a data base of correspondence, i.e., letters or memoranda, a method is described for constructing from it a practically computable statistical predictor which is implicitly dependent on semantic context. Let p(w) be any predictor with LOCAL CONTEXT such as one-gram relative frequencies (CONTEXT-FREE) f(w). p(w) = f(w) = number of occurrences of w in training text total number of words in training text or such as (n-gram rel.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 73% of the total text.

Page 1 of 2

NEXT WORD Statistical PREDICTOR in Correspondence

Constructing a practical statistical predictor, implicitly dependent upon semantic context, exploits semantic clustering inherent in the training text, defining global bigrams and using them for prediction. The predictive power inherent in long term memory in text may be used to exploit whatever semantic clustering (table with chair, etc.) can be found by statistical clustering of training text. Given a data base of correspondence, i.e., letters or memoranda, a method is described for constructing from it a practically computable statistical predictor which is implicitly dependent on semantic context. Let p(w) be any predictor with LOCAL CONTEXT such as one-gram relative frequencies (CONTEXT-FREE) f(w). p(w) = f(w) = number of occurrences of w in training text total number of words in training text or such as (n-gram rel. frequencies) f(wn+1/w1 ...wn) etc. In analog with (local,contiguous) bigram counts (1-gram counts) we define the GLOBAL BIGRAM COUNTS M(w2/w1) as M(w2/w1) = Number of occurrences of w1xx...xw2 within some memo/letter of the training text where xx...xx is an arbitrary (possibly null) string of words. Consider an initial sequence wn1 = w1w2 ...wn of words in a memo or letter. To predict the next word we compute the MEMOGRAM PREDICTOR or GLOBAL CONTEXT PREDICTOR q(w/wn1) as

(Image Omitted)

for every w in the vocabulary. Here, a is an optimally chosen threshold constant (e.g., .001) and

(Image Omitte...