THE OPTIMAL SELECTION OF SECONDARY INDICES FOR FILES
Original Publication Date: 1974-Nov-30
Included in the Prior Art Database: 2007-Mar-29
Software Patent Institute
Schkolnick, Mario: AUTHOR [+2]
AbstractTHE OPTIMAL SELECTION OF SECONDARY INDICES FOR FILES Mario SchkolnickCarnegie-Mellon University Pittsburgh, Pa. 15213
THE OPTIMAL SELECTION OF SECONDARY INDICES FOR FILES Mario Schkolnick
Pittsburgh, Pa. 15213
November 1 974
We consider the problem of finding an optimal set of indices for a file. A general model for a file is assumed together with a probabilistic model of the transactions condu:ted with it: Queries, Updates, Insertions and Deletions. It is shown that all the information assumed for each attribute can be condensed into two parameters and that properties of the optimal solution can be derived from this condensed information. An algorithm to find the optimal set of indices based on these properties is exhibited.
This work was supported hy Defense Advanced Research Projects A~ency
under Contract F44620-73-C-0074.
We consider here the problem of selecting a set of attributes for which a secondary index should be provided in order to minimize the expected cost per transaction conducted with a file.
A proper solution to this problem has to coilsider !he system in which the flie will exist as well as characteristics such as accesstng merhanlsms to ~t and statistical properties of the transactions that are conducted with the file, Some systems which have been implemented use the sinlple approach of providing indices for all attributes  while others do not provide indices at all . For these systems the optimization problem does not exist. Between these two extreme approaches there are systems like XRM  which al~sw the user to specify which domains of the relation should be indexed. Some high level larguages have been designed (4) for which the system, while in the process of perforwing a transaction, creates temporary indices fof some attributes. After the transaction is completed, the user is informed of the newly created ind~ces and may then choose to keep them or delete them. Systems such as these and others for which on tics transacticns may be conducted with a file by users whose demands on the file change in t~me are best suited for solutions as described here. After collecting statistical properties of the transaction that the current set of users conducts with the file, the system may campute an optimal set of indices to minimize transaction processing time. Then, it creates its optimal irdex set throwing away those indices which contribute to increase transaction time and creating indices which help to decrease transaction time. Since this overhead processinz may be appreciable, the system will probably perform this updating only at some fixed in!ervals of time.
Recently, a number of studies have appeared in the literature which consider this