Browse Prior Art Database

Hash Trees

IP.com Disclosure Number: IPCOM000040185D
Original Publication Date: 1987-Oct-01
Included in the Prior Art Database: 2005-Feb-02
Document File: 3 page(s) / 17K

Publishing Venue

IBM

Related People

Rubin, F: AUTHOR

Abstract

To achieve fast retrieval of data using symbolic names, a method called hash trees combines the features of tree search with hashing to obtain the benefits of both. In cases where the number of names is not known with any accuracy beforehand, or where the number of names increases over time, the use of hash trees, checkwords, and extension lists lets searching be carried out with fewer look-ups and less wasted space than existing methods, without the need to reorganize the tables. The method is to take the name of an item, and to change (hash) the characters of the name to new values in such a way that any value is equally likely in any position in the string, independent of context. That is, the hashed character values in each position in the string should be uniformly and independently distributed.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 38% of the total text.

Page 1 of 3

Hash Trees

To achieve fast retrieval of data using symbolic names, a method called hash trees combines the features of tree search with hashing to obtain the benefits of both. In cases where the number of names is not known with any accuracy beforehand, or where the number of names increases over time, the use of hash trees, checkwords, and extension lists lets searching be carried out with fewer look-ups and less wasted space than existing methods, without the need to reorganize the tables. The method is to take the name of an item, and to change (hash) the characters of the name to new values in such a way that any value is equally likely in any position in the string, independent of context. That is, the hashed character values in each position in the string should be uniformly and independently distributed. Then a tree is built using these values to navigate from level to level. Merely changing the characters on a one-to-one basis would not achieve this type of hashing, even if the translation were dependent on the position within the string. For example, if A and Z in the third position were always translated to & and #, and if they occurred in third position 10% and 0.01% of the time, respectively, then & and would occur in third position 10% and
0.01% of the time. There would be no improvement in the distribution of character frequencies by such a scheme. There are many possible ways to hash a name to character values having uniform frequency distribution. Several methods have been tried successfully. All of them use a translation table that converts a character value to a random character. The translation table is a permutation of the normal character values, so that two different characters never translate to the same value. The translation table is built once, then stored for use. In fact, it can be stored as a constant in the hashing program. A process used to derive the hashing table is: For I = 0 to 255 /* Initialize the table. */

H(I) = I

For I = 0 to 255

J = Random(256) /* Choose a random element. */

Swap (H(I), H(J)) /* Swap elements. */ Hashing is done by making several passes through the characters in the name. Before hashing, any short names are extended to a minimum length (h+f) by appending blanks or null characters. The hashing process for a name N of length L is as follows: For I = 1 to L-1

N(I) = N(I) && H(N(I-1)) /* First pass. */

N(0) = N(0) && H(N(L-1)) /* Wrap around. */

For I = 1 to L-1

N(I) = N(I) && H(N(I-1)) /* Second pass. */

N(0) = N(0) && H(N(L-1)) /* Wrap around. */

For I = 1 to L-1

N(I) = N(I) && H(N(I-1)) /* Third pass. */ This hashing process is reversible. It guarantees that no two names can ever hash to the same string of values. The values produced by the hashing process above are integers modulo 256, implying a branching

1

Page 2 of 3

factor of 256. A branching factor of 16 could be obtained by unpacking these values into hexadecimal digits. Branching factors of 4, 8, 32, etc., can be ach...