Browse Prior Art Database

Method for Ordering and Traversing a Hash Collision Chain

IP.com Disclosure Number: IPCOM000108259D
Original Publication Date: 1992-May-01
Included in the Prior Art Database: 2005-Mar-22
Document File: 4 page(s) / 125K

Publishing Venue

IBM

Related People

Schendemann, W: AUTHOR [+2]

Abstract

When the expectation of collisions for a hashing function is quite frequent, i.e., number of buckets in the hash table is small in comparison with the number of identifiers, the collision chain can grow to be very large. As a result, the time needed to locate a specific identifier increases at a linear rate. The average number of elements that need to be processed is N/2, where N is the number of elements currently in the queue. Having the identifiers that appear most frequently in the data at the beginning of the chain would greatly reduce this search time. The problem is compounded if it cannot be determined in advance the relative density of the identifiers in the data.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Method for Ordering and Traversing a Hash Collision Chain

       When the expectation of collisions for a hashing function
is quite frequent, i.e., number of buckets in the hash table is small
in comparison with the number of identifiers, the collision chain can
grow to be very large. As a result, the time needed  to locate a
specific identifier increases at a linear rate. The average number of
elements that need to be processed is N/2, where N is the number of
elements currently in the queue. Having the identifiers that appear
most frequently in the data at the beginning of the chain would
greatly reduce this search time.  The problem is compounded if it
cannot be determined in advance the relative density of the
identifiers in the data.

      This algorithm describes a mechanism of ordering a hash queue
based on the relative occurrence of an identifier, and a method of
searching the queue. The algorithm will perform extremely  well if
the data stream contains "bursts" of identifiers.
DEFINITIONS
  B         hash bucket.
  C(LAST,B) address of last element processed in hash bucket b.
  C(B)      contents of hash bucket. If C(b) = 0, then no collision
has occurred. Otherwise, C(b) is a pointer to the "head" of the
collision chain.
  X         identifier. F(x) = b (i.e., hashing function).
  P         p'th element in linked list.
  PRIOR     address of prior element in list.
  MOVED     address of element that was repositioned in collision
queue.
  ID(P)     identifier of p'th element in list.
  NEXT(P)   pointer to the next element in list.
  FREQ(P)   frequency count.

      An example of a hash queue is as follows:

                            (Image Omitted)

ASSUMPTIONS
      o    the "load density" for the hash buckets will be high
(i....