Browse Prior Art Database

Method and technique to use statistical views to optimize federated queries

IP.com Disclosure Number: IPCOM000173431D
Original Publication Date: 2008-Aug-05
Included in the Prior Art Database: 2008-Aug-05
Document File: 8 page(s) / 62K

Publishing Venue

IBM

Abstract

The problem solved by this invention is that of improving the runtime of federated queries. Federated queries, much like regular SQL queries, are optimized by a cost-based optimizer. Such optimizers use statistical information about the objects involved in the query as the basis for their decision making process. Consequently, incorrect or missing statistics about federated objects can lead to bad decisions during query optimization, which in turn leads to the selection of sub-optimal plans. Current solutions rely on gathering statistics for each individual federated object and using these during the planning phase of the optimizer. The statistical information is usually gathered when the nickname is created, and is based on statistics already present at the data source. If the data source statistics are out of date, or have not been gathered, then the statistics available to the federated optimizer do not accurately reflect the object on which the nickname is created. In addition, even with accurate statistical information, it can be difficult for the optimizer to accurately ascertain how a relationship between federated objects (such as a join) or constraint on a federated object (such as a predicate) will affect the output of that operation (such as the cardinality). If the output from the operation is miscalculated by the optimizer, then this 'mistake' is likely to be further compounded by later decisions made by the optimizer which are based on the inaccurate calculation. The net result is that a sub-optimal plan is chosen by the optimizer and run time performance suffers. This invention builds upon the statistical views feature which was introduced in DB2 UDB V8.2 to allow for more accurate calculation of cardinality estimates. Cardinality estimation is the process whereby the optimizer uses statistics to determine the size of partial query results after predicates are applied or aggregation is performed.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 18% of the total text.

Page 1 of 8

Method and technique to use statistical views to optimize federated queries

It is often the case that the cardinality estimates made by the optimizer for SHIP/RPD operators (which send query fragments to the remote data sources,) are inaccurate. The premise of this invention is to use statistical views that are closely tied to the SQL which is sent to the remote data source. By matching the statistical view to the query sent to the remote data source, it is possible to obtain (more) accurate cardinalities for the SHIP/RPD operators. The statistical view will:
a) be based on the SQL which is pushed down to the remote data source (ie: within the SHIP operator for relational sources or RPD operator for non-relational sources),
b) use the actual number of rows returned from the SHIP/RPD operator (counted during runtime) as the CARD value of the statistical view

On subsequent executions of the same federated query, the optimizer will be able to use the accurate information provided in the newly generated statistical view to obtain a far more accurate estimate of the number of rows returned from that SHIP/RPD operator. Consequently, more accurate planning decisions can be made by the optimizer which should lead to better plan selection - hence, improving performance.

One of the main advantages of linking the statistical view closely to the pushed down SQL fragment is that it will provide the optimizer with accurate cardinality information for those SHIP and RPD operators - something that so far has been impossible to achieve. This approach also fits nicely into the typical query tuning process where a query is iteratively tuned until performance is acceptable.

The creation of the stat view and discovery of the number of rows returned by the SHIP/RPD operator can be accomplished either manually or automatically. The first is a manual process/technique which can be implemented today on WS Federated Server v9, the second is a design for an autonomic version which has yet to be implemented.

1. Manually generating the statistical view:

To manually generate a statistical view for the statement pushed down to a data source, one must explain the federated query and look at the details section of the SHIP/RPD operator in question. Within this section, the query fragment sent to the remote data source will be detailed - for relational data sources, this will be in the form of a SQL statement, for non-relational data sources it will be in the form of a nickname, column and predicate list.

For relational data sources, this SQL fragment is in the native syntax of the remote data source, and not DB2 syntax. Therefore, before this SQL can be used to generate a DB2 statistical view (which must be in DB2 syntax), it must go through a 2 step reverse engineering process:

1

Page 2 of 8

    1. SQL translation phase: where the SQL is translated...