Browse Prior Art Database

"optimal" Hashing Function

IP.com Disclosure Number: IPCOM000128515D
Original Publication Date: 1974-Dec-31
Included in the Prior Art Database: 2005-Sep-16
Document File: 4 page(s) / 19K

Publishing Venue

Software Patent Institute

Related People

Kurt Maly: AUTHOR [+3]

Abstract

We investigate the class of fait extraction functions as to their merits for the purpose of hashing keys. A heuristic algorithm is provided which will find the optimal or near optimal function among this class for a particular set of keys, where a function is optimal if the number of collisions :and the amount of storage is minimal. We discuss various applications together with specific experiments on FORT~N keywords and-variable names.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 32% of the total text.

Page 1 of 4

THIS DOCUMENT IS AN APPROXIMATE REPRESENTATION OF THE ORIGINAL.

"optimal" Hashing Function

by Kurt Maly

Computer, Information, and Control Sciences Institute oEUR Technology University aEUR Minnesota Minneapolis, Minnesota 55455 Technical Report 74-21 September, 1974 hover design courtesy oEUR Ruth and Jay Leavitt "optimal" Hashing Function by. Kurt Maly

Department of Computer and Information Sciences University of Minnesota Minneapolis, Minnesota 55455

. . Abstract

We investigate the class of fait extraction functions as to their merits for the purpose of hashing keys. A heuristic algorithm is provided which will find the optimal or near optimal function among this class for a particular set of keys, where a function is optimal if the number of collisions :and the amount of storage is minimal. We discuss various applications together with specific experiments on FORT~N keywords and-variable names.

Keywords and ghrases;s hashing, keys, loading factor, expected number of probes.

CR categories: 3.70, 3.74, 4.12, 5.30.

1. Introduction

We address ourselves not to the problem of new hashing schemes, rather we investigate a class of functions for the use of hashing keys. The class of functions we consider is a particular simple ode,. namely, bit extraction. functions. Thus, they certainly fulfill the criterion.,for hashing functions that they be speedily computed. For the second criterion to be satisfied, that the
.number of collisions be minimized, we have to restrict our attention to the' following two situations. First, the set of keys is a-priori known as, for instance, is the case with hash tables of keywords; operators, etc..in compilers and second, the distribution of keys liable to occur in a table is.known. An ex-ampie::for the. latter situation is the symbol table.in a.compiler for which an-approximate distribution may be obtained by correlating the variable. names a.. user group employs over a given period of time. We shall,,_,however, emphasize the first case since this research originated an investigation of an interesting optimization feature in some programming languages (e.g., SETL and SPITBOL). Therein it is possible to direct the compiler to find an optimal implementation of sets which the programmer knows remain static throughout execution of the program. One solution to this problem is to provide the compiler, with a class of hashing functions and an algorithm to select an optimal one for any given set. For the above mentioned situations we shall dive algorithms to find the optimal hashing function in the class of bit extraction functions. Where a function is, optimal if the number of collisions and the amount of storage is minimal. This class has been mostly neglected because, for one reason, there are obvious sets of keys where any such function will perform most dismally (e.g., take the n keys which can be viewed as the corner points of a unit cube in n-dimensional space). On the other

University of Minnesota Page...