Browse Prior Art Database

A system and method for efficiently using parameter markers on skewed data

IP.com Disclosure Number: IPCOM000238320D
Publication Date: 2014-Aug-18
Document File: 8 page(s) / 173K

Publishing Venue

The IP.com Prior Art Database

Abstract

To execute a query or a DML statement (INSERT, UPDATE, DELETE), a cost-based optimizer of database server has to create an access plan. An access plan defines the order for accessing tables, which indexes are used, and by what kind of join methods data is correlated. A good access plan is key for the fast execution of an SQL statement.

However, the usage of parameter markers may hinder the optimizer to make optimal use of statistics. The main purpose of using parameter markers is to save the query preparation cost by reusing the same access plan for queries differ only in actual literal values. However, since the actual literal values are not known until runtime, the optimizer can only create a general, sub-optimal access plan hoping it can be good enough for most possible literal values.

For tables with evenly distributed data, this kind of general access plan works just fine (this is so mainly because the optimizer create such plan assuming an even distribution). However, for tables with highly skewed data distribution, this kind of plan usually turns out to be performing not so well.

Therefore, a solution is desired to improve both the optimization performance of the database optimizer and the performance of the SQL statement execution.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 25% of the total text.

Page 01 of 8

A system and method for efficiently using parameter markers on skewed data

To execute a query or a DML statement (INSERT, UPDATE, DELETE), a cost-based optimizer of database server has to create an access plan. An access plan defines the order for accessing tables, which indexes are used, and by what kind of join methods data is correlated. A good access plan is key for the fast execution of an SQL statement.

When creating an access plan, a cost-based optimizer makes decisions based on statistics, which describes important characteristics about database objects (tables, indexes, columns, etc). Having accurate and up-to-date statistics is the single most important factor to have good SQL statement performance.

However, the usage of parameter markers may hinder the optimizer to make optimal use of statistics. The main purpose of using parameter markers is to save the query preparation cost by reusing the same access plan for queries differ only in actual literal values. However, since the actual literal values are not known until runtime, the optimizer can only create a general, sub-optimal access plan hoping it can be good enough for most possible literal values.

For tables with evenly distributed data, this kind of general access plan works just fine (this is so mainly because the optimizer create such plan assuming an even distribution). However, for tables with highly skewed data distribution, this kind of plan usually turns out to be performing not so well.

In a solution, the optimizer uses the first literal value encountered to create an
access plan and then use this same plan for all subsequent execution. The drawback of this solution is that there's no guarantee that the plan created for the first literal value work equally well for all other literal value. Also, the execution performance tends to fluctuate significantly making it hard to get stable performance.

1


Page 02 of 8

In another solution, the optimizer creates an access plan for each actual literal value encountered. This certainly ensures an optimal plan is used for each execution, however the drawback is that this essentially removes the benefit of using parameter markers.

In another solution, the optimizer re-creates the access plan only when the difference between the literal value encountered and the original one used to create the plan is significant enough. Although this solution does reduce some overhead compared to the previous solution of re-creating every time, it still has similar drawbacks and also very high runtime cost.

In yet another solution (US20130159321A1), the optimizer sampled the execution instance history to find an optimal access plan based on the usage frequency of the literal value, the filter factor of the column. The drawback of it is in order to find an optimal access plan, several plans for different literal value need to be created first and this whole evaluation process needs to be performed continuously. The overhead of finding an o...