Browse Prior Art Database

Data Organization for Storing Permuted Keys

IP.com Disclosure Number: IPCOM000078158D
Original Publication Date: 1972-Nov-01
Included in the Prior Art Database: 2005-Feb-25
Document File: 4 page(s) / 54K

Publishing Venue

IBM

Related People

Clarke, DC: AUTHOR

Abstract

This is a data organization and storing method for handling textual material for display on a computer-controlled display station.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 4

Data Organization for Storing Permuted Keys

This is a data organization and storing method for handling textual material for display on a computer-controlled display station.

A widely used techniQue in the publication process for the titles of technical reports, is the Keyword-in context (KWIC) or "permuted" type of listing. This listing has the feature that each title is repeated once for each word that it contains (except words contained in a "stop list"). The titles are then permuted and sorted to the right of the window position, to produce a list in which a title can be found if any of its words are known. For example, a paper entitled EVALUATION OF INFORMATION RETRIEVAL SYSTEMS would be found in the E's under EVALUATION, in the I's under INFORMATION, etc.

Described below is a data organization and sorting method which sorts the keys to the left, as well as to the right of the window position for storage and display. The following steps in the method will be described:

(l) Permuting

(2) Sorting

(3) Compression

(4) Storing

(5) Formatting for display

(1) Permuting

"Permuting" is the process that produces all the variations of a key required for its later retrieval for display. As an example, consider the term: ANALOG COMPUTER TECHNOLOGY. The permuting algorithm given in Fig. 1 generates the following variations to be stored: 1) TECHNOLOGY Psi COMPUTER Beta ANALOG

2) COMPUTER Beta TECHNOLOGY Psi ANALOG Beta

3) ANALOG Beta COMPUTER Beta TECHNOLOGY Psi. where represents the nonprinting hexadecimal character '3F'x, and represents the "blank" hexadecimal character '40'x.
(2) Sorting.

Because of the way the terms are permuted, the sorting procedure is straight- forward and can be done with generalized sorting procedures available. The permuting step records the length of the longest key encountered during that step: all keys are then end-padded with blanks to match that length in preparation for the sorting step. The sort then operates on the entire field of permuted key plus the end-padding. Since the ('3F'X) character sorts lower than the blank character ('40'X), the proper distinction is maintained between nouns and their attributives within a phrase, so that upon display all instances of a word used as a noun precede all instances of the same word used as an attributive (see Fig. 3(a) where TECHNOLOGY appears in both conditions). (3) Compression.

1

Page 2 of 4

After sorting the keys it is necessary to check for duplicates, since the keys have been extracted from a master file and several master file records will, typically, contain the same key. At the same time a record is established as to how many and which master file records contain the key (so that the master records may later be retrieved via the key in any of its permuted forms) and finally, the end-padding introduced for sorting purposes is now removed to reduce storage requirements. (4) Storing.

The keys, along with the data pertaining to their master records, constitut...