REDUCED COMBINED INDEXES FOR EFFICIENT MULTIPLE ATTRIBUTE RETRIEVAL
Original Publication Date: 1975-Nov-30
Included in the Prior Art Database: 2007-Mar-30
Software Patent Institute
Shneiderman, Ben: AUTHOR [+2]
Reduced Combined Indexes For Efficient Multiple Attribute Retrieval
Reduced Combined Indexes
For Efficient Multiple Attribute Retrieval
Computer Science Department
Bloomington, Indiana 47401
Combined indexes were proposed by Lum 111 as an alternative
to the traditional approach of intersecting multiple single attribute
indexes. The combined index approach is appealing for queries re- quiring conjunctions of attribute values since it eliminates the need for time consuming intersections. The large penalty of wasted auxiliary storage space in the combined index approach can be mini- mized by adopting the reduced combined index technique proposed
in this paper. A detailed description, with a number of variations and suggestions for implementation economies, is presented.
combined indexes, reduced combined indexes, inverted files,
indexing, multiple attribute retrievals, f i l e organization, secon- dary index files, information retrieval, data management, f i l e organ- ization, query.
3.73, 3.74, 3.79.
A variety of techniques exist for single attribute retrievals
from record oriented f i l e structures, but there are few practical methods available for multiple attribute retrievals. We assume a f i l e t o be a collection of records containing a number of keyed attributes and non-keyed attributes. Queries are specified by single keyed attributes or combinations of keyed attributes, for example:
Select student-records where (age equals 18)
Select student-records where (age equals 18) and (home-state equals Indiana)
Select student-records wht?r3e (age equals 18) and (home-state equals Indiana) and (class-standing equals sophomore)
The common practice of inverted indexing requires separate indexes t o each of the attributes. Multiple attribute queries are answered by taking the intersection of the index pointers and then retrieving only the records i n the intersection set. This simple approach
is easy t o implement but the cost of responding t o a query is poten- tially great since the intersection operation can be costly, partic- ularly if the number of attribute values for an attribute is small and the number of records is large.
To remedy the high cost of' performing an intersection each time a query is made, Lum [l] proposed the idea of combined indexes or
compound indexing. Lum recognized that i f , for example, three attrl- butes were t o be keyed then only three combined indexes would be necessary to answer all first, second and third degree queries.
The degree of a query is the number of attributes specified i n a conjunctive query. Queries containing disjunctions can be converted t o a series of conjunctiv...