Browse Prior Art Database

THE OPTIMAL SELECTION OF SECONDARY INDICES FOR FILES

IP.com Disclosure Number: IPCOM000148134D
Original Publication Date: 1974-Nov-30
Included in the Prior Art Database: 2007-Mar-29

Publishing Venue

Software Patent Institute

Related People

Schkolnick, Mario: AUTHOR [+2]

Abstract

THE OPTIMAL SELECTION OF SECONDARY INDICES FOR FILES Mario SchkolnickCarnegie-Mellon University Pittsburgh, Pa. 15213 November 1 974 ABSTRACT We consider the problem of finding an optimal set of indices for a file. A general model for a file is assumed together with a probabilistic model of the transactions condu:ted with it: Queries, Updates, Insertions and Deletions. It is shown that all the information assumed for each attribute can be condensed into two parameters and that properties of the optimal solution can be derived from this condensed information. An algorithm to find the optimal set of indices based on these properties is exhibited. This work was supported hy Defense Advanced Research Projects A~ency under Contract F44620-73-C-0074. INTRODUCTION We consider here the problem of selecting a set of attributes for which a secondary index should be provided in order to minimize the expected cost per transaction conducted with a file. A proper solution to this problem has to coilsider !he system in which the flie will exist as well as characteristics such as accesstng merhanlsms to ~t and statistical properties of the transactions that are conducted with the file, Some systems which have been implemented use the sinlple approach of providing indices for all attributes [2] while others do not provide indices at all [6]. For these systems the optimization problem does not exist. Between these two extreme approaches there are systems like XRM [8] which al~sw the user to specify which domains of the relation should be indexed. Some high level larguages have been designed (4) for which the system, while in the process of perforwing a transaction, creates temporary indices fof some attributes. After the transaction is completed, the user is informed of the newly created ind~ces and may then choose to keep them or delete them. Systems such as these and others for which on tics transacticns may be conducted with a file by users whose demands on the file change in t~me are best suited for solutions as described here. After collecting statistical properties of the transaction that the current set of users conducts with the file, the system may campute an optimal set of indices to minimize transaction processing time. Then, it creates its optimal irdex set throwing away those indices which contribute to increase transaction time and creating indices which help to decrease transaction time. Since this overhead processinz may be appreciable, the system will probably perform this updating only at some fixed in!ervals of time.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 11% of the total text.

Page 1 of 18

THE OPTIMAL SELECTION OF SECONDARY INDICES FOR FILES Mario Schkolnick
Carnegie-Mellon University

Pittsburgh, Pa. 15213

November 1 974

ABSTRACT

     
We consider the problem of finding an optimal set of indices for a file. A general model for a file is assumed together with a probabilistic model of the transactions condu:ted with it: Queries, Updates, Insertions and Deletions. It is shown that all the information assumed for each attribute can be condensed into two parameters and that properties of the optimal solution can be derived from this condensed information. An algorithm to find the optimal set of indices based on these properties is exhibited.

This work was supported hy Defense Advanced Research Projects A~ency
under Contract F44620-73-C-0074.

[This page contains 1 picture or other non-text object]

Page 2 of 18

[This page contains 1 picture or other non-text object]

Page 3 of 18

INTRODUCTION

     
We consider here the problem of selecting a set of attributes for which a secondary index should be provided in order to minimize the expected cost per transaction conducted with a file.

      A proper solution to this problem has to coilsider !he system in which the flie will exist as well as characteristics such as accesstng merhanlsms to ~t and statistical properties of the transactions that are conducted with the file, Some systems which have been implemented use the sinlple approach of providing indices for all attributes [2] while others do not provide indices at all [6]. For these systems the optimization problem does not exist. Between these two extreme approaches there are systems like XRM [8] which al~sw the user to specify which domains of the relation should be indexed. Some high level larguages have been designed (4) for which the system, while in the process of perforwing a transaction, creates temporary indices fof some attributes. After the transaction is completed, the user is informed of the newly created ind~ces and may then choose to keep them or delete them. Systems such as these and others for which on tics transacticns may be conducted with a file by users whose demands on the file change in t~me are best suited for solutions as described here. After collecting statistical properties of the transaction that the current set of users conducts with the file, the system may campute an optimal set of indices to minimize transaction processing time. Then, it creates its optimal irdex set throwing away those indices which contribute to increase transaction time and creating indices which help to decrease transaction time. Since this overhead processinz may be appreciable, the system will probably perform this updating only at some fixed in!ervals of time.

Recently, a number of studies have appeared in the literature which consider this

8

[This page contains 1 picture or other non-text object]

Page 4 of 18

problem....