Browse Prior Art Database

Clustering Algorithm for Hierarchical Structures

IP.com Disclosure Number: IPCOM000088524D
Original Publication Date: 1977-Jun-01
Included in the Prior Art Database: 2005-Mar-04
Document File: 4 page(s) / 19K

Publishing Venue

IBM

Related People

Schkolnick, M: AUTHOR

Abstract

This article describes a method for determining the optimal grouping of a given hierarchical structure in data set groups in order to minimize the expected number of page accesses to it when the access pattern is known. The applicability of the technique is based on the fact that the algorithm it uses runs in time proportional to the number of nodes in the hierarchy.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 29% of the total text.

Page 1 of 4

Clustering Algorithm for Hierarchical Structures

This article describes a method for determining the optimal grouping of a given hierarchical structure in data set groups in order to minimize the expected number of page accesses to it when the access pattern is known. The applicability of the technique is based on the fact that the algorithm it uses runs in time proportional to the number of nodes in the hierarchy.

The problem considered is how to store hierarchically structured data so that, when an access pattern to it is known, the total number of disk requests is minimized. The hierarchically structured data is accessed by a data base management system (DBMS) upon request by a user. When a user requests that data be retrieved or altered, the DBMS in general will have to access a number of elements of the hierarchical structure in order to find the relevant data to this particular request. Exactly which records are accessed will depend on (1) which record was last accessed before the user's request was made and (2) the existing access paths in the hierarchical structure. These access paths determine which record can be visited next after a given record.

In general, not every record in a structure can be visited immediately after a given record since this would require a capability for random access into the structure. This capability is very costly in terms of implementation. However, a restricted capability is frequently present, and in the case of a hierarchical structure, this is almost always the capability of visiting nodes which are related in the structure by a parent to child relationship.

Thus, one finds the capability of going from any given record to (some of) its children, to its (unique) parent and to its (next) brother. In order to describe the method and solution presented here, we will use the IMS System as an example [1]. One will also make use of some of the IMS terminology, since it is well known and understood by people working in the Data Base area (see, for example, Date's book [2]). The ideas presented here are, however, not restricted to the IMS System and are applicable in general to any system with Hierarchical Data Structures.

One assumes that one is given a description of a type tree, an access method (i.e., which access paths are present) and a traffic frequency of accesses to it. This traffic frequency is given by the following parameters: n[I,I] , the (relative) number of times that the DBMS accesses a segment of type I immediately after it accesses a segment of the same type I. n[I,J] , the (relative) number of times that the DBMS accesses a segment of type J immediately after it accesses a segment of type I, provided segment type J is a child segment of segment type I. n[J,I], the (relative) number of times that the DBMS accesses a segment of type I immediately after it accesses a segment of type J, provided segment type J is a child segment of segment type I.

The above parameters are given for...