Database query execution time optimization with limiting number of low-level statistics comparisons
Publication Date: 2015-Feb-03
The IP.com Prior Art Database
Optimization of number of metadata checks during query processing in the database/data warehouse taking into account the degree how valuable information given check brings.
Page 01 of 5
Database query execution time optimization with limiting number of low - comparisons
The article relates to technical field of databases. It can be applied to any database implementation including parallel processing implementation (like Hadoop, MPP etc). The idea is valid for the databases which have a low-level statistics implemented - that is metadata about small chunks of data (referred further as extends). Purpose of such metadata is to limit the IO operations to minimum - based on those statistics, system is able to decide if particular extend needs to be read in order to process particular query. Such decision is based on the several checks which need to be performed taken into account metadata and information coming from after parsing the query. Of course from decision point of view, amount of kept metadata should be as rich as possible - more data about rows stored in the extend, more chances to make a correct fact based decision.
Problem which is addressed by the article relates to number of checks which need to be performed on the every extend level during query execution. Proposed solution optimize number of checks taking into account the degree how valuable information given check have.
The main idea of this article is the method of number of metadata checks optimization taking into account the degree how valuable information given check brings. In other
words we want to filter out all metadata checks which with high probability does not bring value - the result is obvious or not meaningful. The checks can be simple number comparison like in the I.e. zone maps which contain min and max value for the particular column in the extend, but it is no limited to this kind of operation only.
Several approach can be used to be able to filter out not needed checks. This article describes two: first one simpler based on the data spread information within database, and second, more sophisticated based on low level query feedback.
To better understand the whole algorithm, let's describe the method step by step:
1. Query A is executed on the database system 2. During execution of the query A, database engine is defining the list of the extends which will be read from the disk: 1. Read metadata about the single extend 2. Perform number of checks to determine if this extend contains interesting data 3. Filter out extend from the list or put it there based on previous step 3. reading from the disk(s) all extends necessary for performing the query based on the list from previous step
4. system executes the calculation defined by the query
5. result is returned to end user/system
Page 02 of 5
This idea is based on the idea of low level statistics (also called zone maps). Low level statistics are metadata which contains statistical information about data stored within one extend (small block of data). Typical statistics are minimum and maximum value in pre-defined order (for string it can be...