Browse Prior Art Database

Size Dependent Bit Shift Hashing Algorithm

IP.com Disclosure Number: IPCOM000107609D
Original Publication Date: 1992-Mar-01
Included in the Prior Art Database: 2005-Mar-22
Document File: 4 page(s) / 188K

Publishing Venue

IBM

Related People

Azarbayejani, NA: AUTHOR [+2]

Abstract

When determining a hashing function, the main concern is to create an algorithm that produces as even a distribution as possible of values within the hash table boundary. Even distribution in the hash table becomes a major performance issue when the program requires a large amount of data to be stored in the hash table and storage considerations require that the hash table be as small as possible. This translates into a desired ratio of 3/4 or greater between the number of entries in the hash table and the actual number of slots in the table. A hashing algorithm should be chosen to distribute the hash table entries evenly throughout the table, so that the collision list for any one hash table slot is very small.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 28% of the total text.

Size Dependent Bit Shift Hashing Algorithm

       When determining a hashing function, the main concern is
to create an algorithm that produces as even a distribution as
possible of values within the hash table boundary.  Even distribution
in the hash table becomes a major performance issue when the program
requires a large amount of data to be stored in the hash table and
storage considerations require that the hash table be as small as
possible.  This translates into a desired ratio of 3/4 or greater
between the number of entries in the hash table and the actual number
of slots in the table.  A hashing algorithm should be chosen to
distribute the hash table entries evenly throughout the table, so
that the collision list for any one hash table slot is very small.
Disclosed here is an algorithm that significantly improves the hash
table collision rate for a large hash table with a large (3/4 or
greater) ratio of hash table entries to hash table slots.

      In describing the algorithm, the example shown assumes an
8-character case insensitive hashing key, assumes that the computer
used follows the Big Endian byte ordering scheme (most significant
byte is in the lowest memory address), assumes availability of a
"Shift Left Double Logical" function and double register division,
and assumes 4-byte hash table indexes.  Allowances need to be made
when these assumptions are not valid, but the concepts are still the
same.

      The Size Dependent Bit-shift Hashing Algorithm is based on the
prevalent hash function method of converting a hash key string into a
number, folding it, and adding the halves into a 4 byte hash index.
The drawback of this simple folding method, is that when the
characters are directly folded together their significance is
combined, and each character in the key loses some of its own
significance. The Size Dependent Bit-shift Hashing Algorithm adds
bit-shifting to this basic scheme to return some of the significance
lost while folding.  This algorithm also takes the shifting one step
further by calculating the optimal number of bits to be shifted to
give the most significance possible to each character in the hash
key.  This is done by considering the size of the hash table and the
method of character to number translation.

      Implementing the Size-Dependent Bit-shift Hashing Algorithm can
be divided into three parts: preparatory steps, initialization steps,
and run-time hash index calculation steps.

      The preparatory steps necessary for implementing the algorithm
are:
      1.   Determine the method of character-to-number translation.
      2.   Determine the size of the hash table (number of hash table
slots).

      If characters are used as is, their numerical representation is
usually very high.  For example, the EBCDIC translation of the
character 'A' is hex 'C1'.  Since the Size-Dependent Bit-shift
Algorithm involves bit-shifting, it is advantageous to choose a
...