Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Method For Organizing A Hash Table

IP.com Disclosure Number: IPCOM000114789D
Original Publication Date: 1995-Feb-01
Included in the Prior Art Database: 2005-Mar-29
Document File: 4 page(s) / 150K

Publishing Venue

IBM

Related People

Blandy, GO: AUTHOR

Abstract

Disclosed is a method for organizing a hash table. A data field is reused to act as an collision pointer anchor thereby reducing storage requirements. Techniques to minimize Central Processing Unit (CPU) utilization during hash entry lookups are also disclosed.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 39% of the total text.

Method For Organizing A Hash Table

      Disclosed is a method for organizing a hash table.  A data
field is reused to act as an collision pointer anchor thereby
reducing
storage requirements.  Techniques to minimize Central Processing Unit
(CPU) utilization during hash entry lookups are also disclosed.

      Typical hash tables become inefficient as the capacity of the
hash table is approached.  Traditional solutions involve:
  o  initial over allocation of storage
  o  dynamic reorganization
  o  use of collision pointers in the table entries and
  o  dynamic acquisition of overflow storage

      The amount of storage required by a hash table can have a
negative impact on system performance.  This is particularly true if
the hash table resides higher in the storage hierarchy than the
elements it represents.  For example, a hash table may reside in a
relatively small but very fast main storage and represent the
enormous
number of records that could reside on larger and slower DASD.

      Each of the above stated solutions requires significant excess
storage at one or more times as the number of hash table entries
increases.

      The first solution allocates enough storage at initialization
to hold the number of elements required during peek use of the table.
When the table is not at peek use, its excess storage is unavailable
to other tasks in the system and therefore wasted.

      The second solution is a dynamic version of the first solution.
When the hash table approaches capacity, it creates a new table with
some excess capacity.  The advantage here is that the growth is
incremental and the excess can be minimized.  The disadvantage is
that a significant amount of CPU time is required for the
reorganization and system performance can become very "lumpy".

      Note that neither of the first two solutions involve collision
pointers.  This means that significant extra overhead is required
when a collision is encountered.  Normally this involves iterative
rehashing (using different hash algorithms than the original) or
searching (forward or backward) from the original hash node or a
combination of the two.

      The third solution requires an additional pointer field in each
of the hash table entries.  The pointer field would typically be the
size of the address word for the computer architecture being used.
For System 390 and most current workstation architectures (POWER,
POWER PC* 601 & 604) this would be 32 bits.

      This disclosure introduces a more efficient implementation of
the "collision pointer" solution.  It provides for reuse of one of
the hash node fields to act as a collision pointer when a collision
exists but allows that word to be used for hash node data when no
collision exists.  A single bit is used to indicate if the dual
purpose field is being used as a collision pointer or a data
repository.  For example if a hash table node is three words (32 bits
per...