Browse Prior Art Database

Technique for Storing and Retrieving Linguistically Related Terms

IP.com Disclosure Number: IPCOM000040957D
Original Publication Date: 1987-Apr-01
Included in the Prior Art Database: 2005-Feb-02
Document File: 1 page(s) / 12K

Publishing Venue

IBM

Related People

Zamora, E: AUTHOR

Abstract

A technique is described for storing linguistically related information in a compact form that is easily decoded. The technique makes it possible to match text very efficiently against a dictionary of phrases or multiple word expressions. The search against the dictionary is speeded up by eliminating searches that cannot result in a match. This is done by indexing the phrases on the least frequent word and by building a hash screen of the phrase words which are adjacent to the index word. Thus, even though a phrase may consist of very common words, the dictionary is not searched unless the text being scanned contains some of the characteristics of the phrases against which it could match.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 93% of the total text.

Page 1 of 1

Technique for Storing and Retrieving Linguistically Related Terms

A technique is described for storing linguistically related information in a compact form that is easily decoded. The technique makes it possible to match text very efficiently against a dictionary of phrases or multiple word expressions. The search against the dictionary is speeded up by eliminating searches that cannot result in a match. This is done by indexing the phrases on the least frequent word and by building a hash screen of the phrase words which are adjacent to the index word. Thus, even though a phrase may consist of very common words, the dictionary is not searched unless the text being scanned contains some of the characteristics of the phrases against which it could match.

This method of matching against dictionaries of phrases has applications in both batch and real-time applications such as: 1) the identification of trite or incorrect phrases for replacement with standard text; 2) identification of idioms or other multiple word terms that cannot be translated in isolation of their context; 3) synonym support for multiple word terms (e.g., hot dog) where single-word matching is inadequate.

This method provides: 1. very fast access to phrases and their associated data; 2. rapid elimination of non-candidate word combinations by: - screening to prevent access to phrase database, - organization by statistically least frequent word; 3. retrieval of associated replacement or information...