Browse Prior Art Database

Methods for optimizing queries involving input host variables or parameter markers using distribution statistics

IP.com Disclosure Number: IPCOM000021323D
Original Publication Date: 2004-Jan-13
Included in the Prior Art Database: 2004-Jan-13
Document File: 5 page(s) / 64K

Publishing Venue

IBM

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 26% of the total text.

Page 1 of 5

Methods for optimizing queries involving input host variables or parameter markers using distribution statistics

BACKGROUND INFORMATION

In this article, a method of using the distribution statistics collected by Database Management Systems (DBMS) to optimize queries involving unknown variables (input host variable, parameter marker) in search conditions (A.K.A predicates) is described.

Queries with unknown variables in predicates are usually in the forms:

SELECT EMPLOYEE FROM DEPARTMENT WHERE DEP_NUMBER = :hv; -- HOST VARIABLE

OR

SELECT EMPLOYEE FROM DEPARTMENT WHERE DEP_NUMBER = ? -- PARAMETER MARKER

Input host variables and parameter markers are unknown variables in the query whose actual value will be passed in only at query execution time. Input host variable (:hv) are used for static queries, and parameter markers are used for dynamic queries.

DBMSs such a Relational DBMS (RDBMS) process queries in two distinct phases: the COMPILE phase (also known as PREPARE or BIND phase), and the EXECUTE phase. In the COMPILE phase, the DBMS chooses an optimal Access Path (or access strategy) to compute the result of the query. For static queries, the COMPILE phase can be done in advance for the queries. Once chosen, the access path is stored in an object known as an Access Plan that can be retrieved for later use. The Access Plan is then executed in the EXECUTE phase to produce the result for returning to the application. Typically, a query is associated with a single access plan. Access Plan can be shared and reuse for multiple execute requests by multiple applications executing the same query.

DBMSs collect and maintain a multitude of statistical information about the data they manage in order to optimize access paths for queries on these data. Such statistics include the distribution of the data and frequent occurring values in the data. One important usage of these statistics is in the computing Filter Factor (how many rows in the table or relation will qualify a given predicate) of various predicates in queries. The Filter Factor of predicates is basis for DBMSs to select optimal access paths for queries.

In general, a RDBMS collects different statistics about tables, indexes, and columns within tables. For the scope of this invention, distribution statistics on columns are the most interesting and relevant.

Specifically, DB2 UDB for UWL collects for following statistics for columns in tables:

Column Value Statistics : Column Cardinality Average Column Value Length Second Highest Value in Column Second Lowest Key in Column Number of NULL values in Column

Column Data Distribution Statistics : (the number of frequency or quantile collected are user defined ) * Frequency Statistics :
Frequency Ranking of Column value - The ranking of the occurrence frequency of the Column Value Column Value
Frequency - The number of occurrence of the Column Value

* Quantile Statistics:

1

Page 2 of 5

Quantile Number Column Value Quantile Value Count - a c...