Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Method to optimize ETL performance

IP.com Disclosure Number: IPCOM000124635D
Original Publication Date: 2005-May-02
Included in the Prior Art Database: 2005-May-02
Document File: 2 page(s) / 85K

Publishing Venue

IBM

Abstract

Disclosed is a method to optimize ETML (extract, transform, move and load) performance which is usually gated by the use of intermediate or staging tables between ETML steps that can be solely expressed in SQL. The method basically eliminates the use of the staging tables by using table expressions in SQL.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 54% of the total text.

Page 1 of 2

Method to optimize ETL performance

ETML (also known as ETL - extract,transform,move,load) performance has a well known bottleneck: intermediate staging tables. Users of ETL tools like IBM DB2 Warehouse Manager are used to performing ETL in several logical serial steps, where each step in an SQL operation, or an application program which can be sometimes expressed solely with SQL. Output of each step is a staging table which is input to the next step. Several such sequential steps create scalability and performance problems since each step depends on the completion of the previous step in the sequence.

Consider an ETL sequence with three SQL steps(figure 1). In this scenario, the database will optimize and execute each statement separately without being aware of the global context. In contrast, if all three steps are collapsed into a single SQL statement the resulting statement not only avoids generating intermediate table (which in most cases are not needed anyway) at execution time, but it also allows the database to perform query optimization globally on all three statements at once thus being able to generate a better access plan compared to the combined three separate plans. The steps can be collapsed using a feature of SQL called tableexpressions (figure 2).

A good example of such optimization would be that the optimizer might perform function pushdown by evaluating predicates immediately on the sources rather than on the intermediate results, possibly using...