Browse Prior Art Database

REORGANIZATION POINTS FOR FILE DESIGNS WITH NONLINEAR PROCFSSING COSTS

IP.com Disclosure Number: IPCOM000128260D
Original Publication Date: 1975-Dec-31
Included in the Prior Art Database: 2005-Sep-15
Document File: 3 page(s) / 18K

Publishing Venue

Software Patent Institute

Related People

K. Sundar Das: AUTHOR [+5]

Abstract

An importan t problem in accessing and updating large scale data bases is thedetermination of when to reorganize the data and eliminate records that have been flagged for deletion but have not as yet been physically deleted. Such records take up valuable storage space and cause a degradation of access times to the active records, mainly because of,the necessity to maintain over- flow areas. After a certain length of time it is usually desirable to merge the old data and its overflow areas into a new data base of the oric-inal format, thus reallocating the physical storage for maximum efficiency within the constraints imposed by the data base organization. The purpose of this study is to de ' termine the effect of various system and user parameters on t1le choice of optimal reorganization points over a wide range of file organizations and user applications. The analysis of file reorganization is seen as a first step toward understanding large scale data base reorganization. The models for file organizations Eire based on the hierarchical access tree descriptions developed by Yao [3). This generalized model is applicable to the access mechanisms for all file orvanizations (i.e.. file desi;.Zns) included in study. The model has been implemented and tested with actual files, and is called the File Design Analyzer program [1]. The program accepts as input, various parameters describing the file characteristics, user workload and hardware for storing the fil.e. The output is a summary of secondary storage space and 1/0 processing time required to process the user workload. A near-optimal data reorganization point is determined on the basis of minimum cost of user access to the file. Cost is defined in this context as the product of secondary storage space and 1/0 processing time for a given set of activities over a specified time interval t. Storage space is computed as the space to house the entire file including intermediate data such as pointers, directories, etc. 1/0 processing time for.the user application includes the retrieval, modification, insertion and deletion of records; modi-fication and insertion of keyword values at an intermediate access tree level; and query type retrieval. 1/0 processing time is defined as the elapsed 1/0 time to perform the user activities in a serial manner. A definition of file growth in terms of keyword values in addition to the number of actual records in the file results in a nonlinear processing cost function for a linear file growth rate. An earlier study by Shneiderman [21 assumed a linear cost function and finite time horizon T, and a closed form solution for a fixed reorganization point n was determined. It is shown here that with nonlinear costs, an infinite horizon, and a fixed size workload, the time between reorganizat'ion points increases as the file grows. On the other hand, a workload level proportional. to file size results in a series of rapidly decreasing reorganization points [I]. [Footnote] *Current address: Department of Computer Science, Purdue University.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 39% of the total text.

Page 1 of 3

THIS DOCUMENT IS AN APPROXIMATE REPRESENTATION OF THE ORIGINAL.

REORGANIZATION POINTS FOR FILE DESIGNS WITH NONLINEAR PROCFSSING COSTS

by K. Sundar Das, T. J. Teorey, S. B. Yao*

CSD TR 160 Proceedings of the International Conference on Very Large Data Bases September 22-24, 1975 Current address: Department of Computer Science, Purdue university

I. Introduction

An importan t problem in accessing and updating large scale data bases is thedetermination of when to reorganize the data and eliminate records that have been flagged for deletion but have not as yet been physically deleted. Such records take up valuable storage space and cause a degradation of access times to the active records, mainly because of,the necessity to maintain over- flow areas. After a certain length of time it is usually desirable to merge the old data and its overflow areas into a new data base of the oric-inal format, thus reallocating the physical storage for maximum efficiency within the constraints imposed by the data base organization. The purpose of this study is to de ' termine the effect of various system and user parameters on t1le choice of optimal reorganization points over a wide range of file organizations and user applications. The analysis of file reorganization is seen as a first step toward understanding large scale data base reorganization. The models for file organizations Eire based on the hierarchical access tree descriptions developed by Yao [3). This generalized model is applicable to the access mechanisms for all file orvanizations (i.e.. file desi;.Zns) included in study. The model has been implemented and tested with actual files, and is called the File Design Analyzer program [1]. The program accepts as input, various parameters describing the file characteristics, user workload and hardware for storing the fil.e. The output is a summary of secondary storage space and 1/0 processing time required to process the user workload. A near-optimal data reorganization point is determined on the basis of minimum cost of user access to the file. Cost is defined in this context as the product of secondary storage space and 1/0 processing time for a given set of activities over a specified time interval t. Storage space is computed as the space to house the entire file including intermediate data such as pointers, directories, etc. 1/0 processing time for.the user application includes the retrieval, modification, insertion and deletion of records; modi-fication and insertion of keyword values at an intermediate access tree level; and query type retrieval. 1/0 processing time is defined as the elapsed 1/0 time to perform the user activities in a serial manner. A definition of file growth in terms of keyword values in addition to the number of actual records in the file results in a nonlinear processing cost function for a linear file growth rate. An earlier study by Shneiderman [21 assumed a linear cost function and finite time horizon T, and a c...