Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Language-Independent Dictionary Storage and Access Technique

IP.com Disclosure Number: IPCOM000042322D
Original Publication Date: 1984-May-01
Included in the Prior Art Database: 2005-Feb-03
Document File: 2 page(s) / 14K

Publishing Venue

IBM

Related People

Carlgren, RG: AUTHOR

Abstract

This technique provides for the storage and access of a very large word list in a minimum of storage space and in a language-independent manner. On computer systems which have a minimum of available main storage, it is necessary to store dictionaries used for linguistic aid support functions on secondary storage devices. However, some of the computer systems having word processing functions which could use linguistic aid support also have a limited amount of available secondary storage. This double restriction, therefore, means that the storage technique used for a dictionary word list insures the use of a minimum of both main storage and secondary storage. Previous attempts to represent large word lists have been unsuccessful at minimizing both main and secondary storage use for both English and non-English word lists.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 51% of the total text.

Page 1 of 2

Language-Independent Dictionary Storage and Access Technique

This technique provides for the storage and access of a very large word list in a minimum of storage space and in a language-independent manner. On computer systems which have a minimum of available main storage, it is necessary to store dictionaries used for linguistic aid support functions on secondary storage devices. However, some of the computer systems having word processing functions which could use linguistic aid support also have a limited amount of available secondary storage. This double restriction, therefore, means that the storage technique used for a dictionary word list insures the use of a minimum of both main storage and secondary storage. Previous attempts to represent large word lists have been unsuccessful at minimizing both main and secondary storage use for both English and non-English word lists. This storage technique uses a combination of devices which exploit the statistical redundancy of prefixal and suffixal data in a given language. Prefixal redundancy is exploited by keeping the explicitly stored words in alphabetical sequence and by representing each such word as a count of the number of leading letters repeated from the previous word, followed by a representation of the letters to be added. Where no letters can be repeated, all letters are explicitly represented with a common encoding technique which exploits the frequency distribution of letters explicitly represented in the stored word list. Suffixal redundancy is exploited in two ways. First, explicitly stored words can be represented as a repeat count and a number which represents the suffixal letters which must be added to the repeated letters to make a new word. The number refers to the position within an index to a table of suffixal letter strings. Second, valid suffixal variations on an explicitly stored word can be represented by including a particular number with the stored word. The number represents the position within an index to a table of specific suffixal letter string combinations. Each valid letter string in this table can be represented as a number which refers to the position within an index to a table of suffixal letter strings. To extend the usability of the suffixal letter string logic, the string can be defined to include connectivity rules. The connectiv...