Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

A Mechanism for Programming Error Recovery for Data Warehouse Processes

IP.com Disclosure Number: IPCOM000018880D
Original Publication Date: 2003-Aug-19
Included in the Prior Art Database: 2003-Aug-19
Document File: 2 page(s) / 88K

Publishing Venue

IBM

Abstract

ETL (extract, transform & load) is a standard means of programming the loading of data into a data warehouse. This disclosure presents a technique to allow user-defined error handlers for ETL processes. The necessary error control itself need not be custom-coded for each ETL step but a single ETL Controller can be re-used across all the ETL steps in the ETL project.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 2

A Mechanism for Programming Error Recovery for Data Warehouse Processes

In data warehousing, database programs that implement extract, transform and load (ETL) operations on different source database tables are linked together to define a control flow. Since an ETL step can fail, there must be a mechanism in place to detect the error, execute an error handler program, and then resume the failed ETL step. Data warehouse programming tools, also called ETL tools, commonly provide flow control based on the success or failure of an ETL step (Fig. 1). If ETL step X fails due to error SQLnnn, the user can program the error handler as an ETL step X_SQLnnn. After handling the error, the user would want to retry the failed step X. However, warehousing tools normally leave it to the user to program retrial control flow (e.g., through a database trigger; note that simply linking step X_SQLnnn back to step X would not be a correct solution as repeated failure of step X would cause looping). However, having to program retrials separately for each ETL step X and for each error SQLnnn that it can generate is very cumbersome for the user.

ETL Step

X

               ETL Step YETL Step X_SQLnnn

success failure

Fig 1: Branching on success or failure

Solution

We propose a generic Error Controller mechanism (Fig. 2). Error Controller is a single ETL step to which all ETL steps transfer on failure. Based on the error, Error Controller transfers to an error handler which is a plain SQL file (not an ETL step) that the user provides. Once the error is fixed, Error Controller also restarts the error c...