Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Reversible Dictionary Compression by Reference to Redundant Components of Vocabulary

IP.com Disclosure Number: IPCOM000048293D
Original Publication Date: 1982-Jan-01
Included in the Prior Art Database: 2005-Feb-08
Document File: 3 page(s) / 15K

Publishing Venue

IBM

Related People

Gonvis, DB: AUTHOR [+2]

Abstract

This article defines a technique for organizing a dictionary which utilizes the natural redundancy in the vocabulary to significantly reduce the space needed to store the dictionary. The storage requirement is reduced by storing root words in combination with control information which points to a plurality of word endings and modifiers that may be concatenated with the root words to construct variations of the words.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 50% of the total text.

Page 1 of 3

Reversible Dictionary Compression by Reference to Redundant Components of Vocabulary

This article defines a technique for organizing a dictionary which utilizes the natural redundancy in the vocabulary to significantly reduce the space needed to store the dictionary. The storage requirement is reduced by storing root words in combination with control information which points to a plurality of word endings and modifiers that may be concatenated with the root words to construct variations of the words.

Each word entry to be stored in the dictionary is defined by a root-modifier- ending combination. The dictionary entries provide the logical information which defines the dictionary content placed in storage. Each dictionary entry defines one to several words. An entry contains control information, the root portion of the words stored, up to three references to rules, and a list of modifier/ending references called an "unrule". The rules and unrule efficiently identify the inflections of the root defined by the dictionary entry.

The control information portion of a dictionary entry defines the number of rule references in the entry (0 to 3), the displacement to the next dictionary entry, plus any usage-related information.

The root is the fixed, common characters of the words defined by the entry. The root need not be a valid word. The rule references are one byte numbers which indicate which entry (0-255) in the rule table applies to the root. There may be up to three rule references or none. The unrule portion of a dictionary entry is used to define specific inflections not included in the more general rules. An unrule references entries in the modifier and ending tables which combine with the root to form the intended words. An unrule can start with any ending references which do not need a modifier to construct the word(s). An unrule can include a modifier reference which works in conjunction with ending(s) which follow it until a new modifier reference changes the modifier or the unrule ends.

A rule table entry contains control information and the rule. The rule table has an entry size defined by the storage space required for the longest rule. The control portion defines the length of the rule and any information needed for the use of the dictionary. The rule is very much like an unrule except that clusters of endings are used in place of individual ending references. A rule can start with any ending clusters (an ending reference number and count of number of endings in the cluster) which do not need a modifier to construct the word(s). A rule can also include a modifier reference which works with the ending cluster(s) following it until a new modifier reference replaces the modifier or the rule ends.

Word endings are stored in a table which has up to 255 entries.

The table entry size is defined by the storage space required for the longest ending. An ending table entry contains control information and the ending itself. The con...