Browse Prior Art Database

A METHOD AND APPARATUS FOR QUERY SAMPLING

IP.com Disclosure Number: IPCOM000013582D
Original Publication Date: 2000-Nov-01
Included in the Prior Art Database: 2003-Jun-18
Document File: 2 page(s) / 46K

Publishing Venue

IBM

Abstract

A METHOD AND APPARATUS FOR QUERY SAMPLING

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 2

A METHOD AND APPARATUS FOR QUERY SAMPLING

A METHOD AND APPARATUS FOR QUERY SAMPLING

Queries against files or databases used for decision support or business intelligence can take many minutes, up to days, to complete. This can be due to very large objects (tables or indices) being accessed or due to complex queries requiring joins between many tables which may involve sorting intermediate results.

Sometimes, an exact result set is not necessary from long-running queries. Sometimes, a "rough guess" is adequate.

This invention provides a means to reduce the elapsed time for queries by "skipping" a specified quantity of data to be accessed by the query.

Disclosed is a method for "skipping" a specified quantity of data accessed by a query resulting in shorter elapsed time.

Decision support or business intelligence systems often exists to help answer "what-if" questions asked by decision makers. For example, the decision maker may want to understand if there is a correlation between the sale of lawn mowers, geographical locations and time of year. However, the decision maker is not interested in exact figures -- he is more interested in seeing if there exists a correlation, and then digging deeper into detailed data. This is a perfect scenario for query sampling. The decision maker can start at a very high sampling rate (meaning skip lots of the raw data), ask the question, and then decide to abandon the analysis, or to dig deeper. He may decrease the sampling rate to increase his confidence in the answer (result set) and resubmit the same query.

The interface for this can be implemented in several different ways. One technique would be via new syntax or with some environmental setting. For example, using SQL:

SELECT *

FROM PARTS

WHERE P_NUMBER = 1024

WITH SAMPLING AT 10%;

-OR-

SET SAMPLING_RATE = '10%';

SELECT *

FROM PARTS

WHERE P_NUMBER = 1024;

-OR-

A BIND parameter of SAMPLE with a parameter value of 10 when compiling an SQL application.

Another technique would be to implement this as a time-based function. For example, "I only have five minutes to ask and receive answers to this question." With this technique, a cost-based optimizer can calculate the required sampling rate since an estimated elapsed time is already known. The same three interfaces as used for the SAMPLE keyword could also be used for this time-based function.

All or just a few selected access paths could be eligible for sampling. Here are a...