Browse Prior Art Database

Minimizing Page Faults to a Finite Buffer As Applied to Nested Scans to Satisfy a Relational Join

IP.com Disclosure Number: IPCOM000039848D
Original Publication Date: 1987-Aug-01
Included in the Prior Art Database: 2005-Feb-01
Document File: 2 page(s) / 14K

Publishing Venue

IBM

Related People

Lohman, GM: AUTHOR [+2]

Abstract

This invention relates to a method for optimizing page faults to a finite buffer when a sequence of pages is to be accessed repeatedly. More particularly, the method is applied to the nested scanning of a data base to satisfy a relational join operation. In the prior art, it is known that a join between an outer table O and an inner table I finds all values in the join column I.J of table I that match a value of the join column O.J in table O. The nested-loop join technique accomplishes this in the most obvious way, by scanning table I for each successive row in O. Each scan goes through I from beginning to end in either physical order (a DBSPACE scan) or in the logical order of some index (an index scan).

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 51% of the total text.

Page 1 of 2

Minimizing Page Faults to a Finite Buffer As Applied to Nested Scans to Satisfy a Relational Join

This invention relates to a method for optimizing page faults to a finite buffer when a sequence of pages is to be accessed repeatedly. More particularly, the method is applied to the nested scanning of a data base to satisfy a relational join operation. In the prior art, it is known that a join between an outer table O and an inner table I finds all values in the join column I.J of table I that match a value of the join column O.J in table O. The nested-loop join technique accomplishes this in the most obvious way, by scanning table I for each successive row in O. Each scan goes through I from beginning to end in either physical order (a DBSPACE scan) or in the logical order of some index (an index scan). The cost of the nested-loop join technique is largely controlled by how many pages of I have to be accessed (fetched from disk) for each new row of O. If I has an index on I.J, then that index can be used to limit the scan of I to just those satisfying the join predicate O.J = I.J for a particular value of O.J, called a key domain. When there is either no key domain or high duplication in the join column (i.e., many rows having the same value), successive scans of I for each row of O will reference the same pages as the previous scan. If the pages referenced (the reference string) can be retained in the buffer, then the nested- loop join is often preferable to other join techniques such as the merge-scan join or hash join. This is the intent, for example, of a fragmentation scan which hashes the rows of O and I to buffer- sized buckets for I having common join- column values so that the pages of I do not have to be reaccessed. When the reference string does not fit in the buffer, it has been suggested to give up trying to keep pages of I in the buffer and using a most-recently-used (MRU) page replacement strategy. This prior art approach throws out each page of I immediately, recognizing that the buffer is very likely to be filled with other pages before that page is re-referenced. The method of the invention, in contrast to the prior art, maximizes the reuse of pages of I already in the buffer by alternately scanning I in opposite directions. This is performed in a manner analogous to the boustrophedonic motion of print heads on current character-at-a-time printers. In the invention, a toggle bit must be maintained by the component initiating the scans of I (the RDS in System R) to remember in which direction table I was last scanned. For each successive outer tuple value, the toggle bit is flipped...