Browse Prior Art Database

Indexing Method Employing Hashing

IP.com Disclosure Number: IPCOM000079589D
Original Publication Date: 1973-Aug-01
Included in the Prior Art Database: 2005-Feb-26
Document File: 4 page(s) / 24K

Publishing Venue

IBM

Related People

Perry, OR: AUTHOR

Abstract

The indexing scheme described utilizes two hashing steps. The first hashing breaks a file logically into a number of pieces of approximately equal size. In conventional terms, each piece corresponds to a bucket. The uncommonly large size of each bucket provides a statistically high probability of approximately equal size. The techniques described provide for unusually convenient handling of overflows from any piece. The lowest level index blocks use techniques described in the publication Index Block with Randomized Keys (IBM Technical Disclosure Bulletin, Vol. 14, No. 3, August 1971, pp. 939-41). They achieve the effect of an extremely low-loading factor without wasting any space. The index is used for random operations only and it is applicable to primary or secondary keys.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 54% of the total text.

Page 1 of 4

Indexing Method Employing Hashing

The indexing scheme described utilizes two hashing steps. The first hashing breaks a file logically into a number of pieces of approximately equal size. In conventional terms, each piece corresponds to a bucket. The uncommonly large size of each bucket provides a statistically high probability of approximately equal size. The techniques described provide for unusually convenient handling of overflows from any piece. The lowest level index blocks use techniques described in the publication Index Block with Randomized Keys (IBM Technical Disclosure Bulletin, Vol. 14, No. 3, August 1971, pp. 939-41). They achieve the effect of an extremely low-loading factor without wasting any space. The index is used for random operations only and it is applicable to primary or secondary keys. Specific sizes for index-block entries and addresses are used by way of example. In an actual case, these parameters can be modified to fit particular needs.

Index Block and Table Formats: The index will always consist of two levels. A lower-level index block has the form seen in Fig. 1, where ki is a 3-byte hashed value related to the key of the ith record and Ai is the 6-byte address of that record. The k's are in ascending order within a block. The value of m is ((block size)/g]. If block size is 1152 bytes, each index block will accommodate 128 records. Index blocks can overflow. The circumstances causing overflow and the methods for handling overflow are described subsequently. The upper level of the index is a table containing a 3-byte entry for each lower-level index block. The table has the form seen in Fig. 2 where kmax-j is either:

(a) The largest value of k in index block j; this value is used if index block j has overflowed; or

(b) all binary ones, this value is used if index block j has not overflowed.

For a 25,000-record file, the table would have approximately 200 entries (one for each of 200 index blocks). Table size would be 600 bytes.

How a Record is Indexed: To index a record, the record's key is hashed twice using two distinct hashing algorithms (H1 and H2) as follows: H1 (key)=j where O<j,/-m

H2 (key)-k where k is a 3-byte value.

The index block and tble entries are devised and maintained in such a way that the following rule applies to each record in the file: The address A of a record is stored adjacent to the value k in the first index block following block j-1 whose kmax is >/-k.

A simple example will clarify the rule.

Assume: Key = JOE H1 (JOE) = 17

H2 (JOE) = 792

1

Page 2 of 4

Then JOE is indexed with k=792 in the first index block following block 16 whose kmax is 792 or greater.

FILE PROCESSING OPERATIONS: Many variations are possible in the procedures that create, maintain, an...