Browse Prior Art Database

H-Trees

IP.com Disclosure Number: IPCOM000128557D
Original Publication Date: 1979-Dec-31
Included in the Prior Art Database: 2005-Sep-16
Document File: 10 page(s) / 31K

Publishing Venue

Software Patent Institute

Related People

K. Maly: AUTHOR [+3]

Abstract

In this paper we describe a new eclectic method to solve the problem of searching a medium to large sized file of n records for a record whose key matches a given key. The scheme combines hashing and tree methods; it is based on a tree of different hashing functions which, when applied to a set of keys, yields a data structure called the H-tree (for Hash-tree). The scheme is, within certain parameters, a self-adaptive one in the sense that the hashing functions used are mainly determined by the keys and their distribution in the file. A performance evaluation shows the H-tree superior to a variety of key-comparison and hashing techniques. Analytical bounds on the average retrieval time and storage requirements are log ,d (n/s) and d n/(s In d) where s and d are parameters of the H-tree. Criteria are provided for the selection of optimal parameters.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 15% of the total text.

Page 1 of 10

THIS DOCUMENT IS AN APPROXIMATE REPRESENTATION OF THE ORIGINAL.

H-Trees

K. Maly and I,. Kampa

Computer Science Department

11.4 Lind Hall

Institute of Technology University of Minnesota Minneapolis, Minnesota

'technical Report 79-27 November 1979 Cover design courtesy of Ruth and Jay Leavitt

H-Trees by K. Maly and L. Kampa

Abstract

In this paper we describe a new eclectic method to solve the problem of searching a medium to large sized file of n records for a record whose key matches a given key. The scheme combines hashing and tree methods; it is based on a tree of different hashing functions which, when applied to a set of keys, yields a data structure called the H-tree (for Hash-tree). The scheme is, within certain parameters, a self-adaptive one in the sense that the hashing functions used are mainly determined by the keys and their distribution in the file. A performance evaluation shows the H-tree superior to a variety of key-comparison and hashing techniques. Analytical bounds on the average retrieval time and storage requirements are log ,d (n/s) and d n/(s In d) where s and d are parameters of the H-tree. Criteria are provided for the selection of optimal parameters.

Key Words and Phrases: Key search, key comparison and hashing techniques, data structure, retrieval time, storage requirements.

CR Categories: 3.74, 4.33, 4.34.

This research was in part supported by a grant of UCC - Universi.ty of Minnesota, Minneapolis.

1. Introduction

An all pervasive problem in computer science is the searching of a collection of records (file) for that record whose key matches a given key. Appearing in many forms and variations, the essence of the problem con be stated as follows: given 'key' and a collection of keys X = {ki , 1__i_:_n} , find the index (location, address) j such that key = k j. Over the years a large number of papers have appeared on this subject; the solutions can be broadly classified into key-to- address-transformation (hashing) techniques, and key comparision methods. Various studies have been performed on hashing techniques [6, 7, 8, 9, 11, 12] to investigate the best ways of handling collisions, insertions, and deletions. Almost all methods based on key comparisons [1, 4, 8, 10] imposO some tr

University of Minnesota Page 1 Dec 31, 1979

Page 2 of 10

H-Trees

involving m-ary trees have a worst case complexity of 0(log n). However, in m practical cases, the more important behavior is the average performance; in an exhaustive study [9] on hashing schemes for medium sized files ('20,000 records) it was shown that on the average hashing performs extremely well, e.g., using the division method, with a bucket size of 20 and a load factor of .9, the average number of bucket accesses was 1.09 per search. In terms of memory requirements, hashing schemes which use chaining need additional pointers to maintain the linked lists of colliding elements, whereas tree 2

schemes normally need space for the structural information of t...