Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Feedback Drive Performance Optimization of ETL Processes in Accelerators Across Multiple Source and Target Systems

IP.com Disclosure Number: IPCOM000243900D
Publication Date: 2015-Oct-27

Publishing Venue

The IP.com Prior Art Database

Abstract

In today's business, huge amounts of data are extracted from source systems, transformed and then loaded in another target system (ETL). These systems may involve structured or unstructured data sources such as database systems. The time that is spent to run ETL processes is critical to business areas that depend on the shortest elapsed time. Also, a chain of ETL operations (where the target system acts as source for the next ETL stage) needs to be as fast as possible in order to match service level agreements (SLA) and time preconditions for data maintenance windows.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 18% of the total text.

Page 01 of 10

1 Motivation

In today's business, huge amounts of data are extracted from source systems, transformed and then loaded in another target system (ETL). These systems may involve structured or unstructured data sources such as database systems. The time that is spent to run ETL processes is critical to business areas that depend on the shortest elapsed time. Also, a chain of ETL operations (where the target system acts as source for the next ETL stage) needs to be as fast as possible in order to match service level agreements (SLA) and time preconditions for data maintenance windows.


1.1 ETL in IDAA

There is a wide variety of reasons for employing ETL processes. In the context of the IBM DB2 Analytics Accelerator, ETL is used for bulk copying data from a DB2 for z/OS system to the accelerator that uses Netezza as its backend database system. (The "transformation" step is rather straight- forward and only involves some data type conversions.)

Illustration 1: IDAA Bulk Load Architectural Overview shows the architecture of the IDAA bulk load process. "Table A" is bulk-loaded. The multiple "unload" utilities and "USS pipes" indicate parallel processing, which is based on a configuration parameter and the table itself (the number of partitions in the

Illustration 1: IDAA Bulk Load Architectural Overview

table).

Page 1 of 10


Page 02 of 10

As of today, IDAA only supports parallel copying of data in partitions of a table, but not parallel copying of multiple tables - in a single LOAD operation. In order to achieve inter-table-parallelism, multiple LOAD operations have to get started manually. There is no interaction between such multiple LOAD operations. This is further discussed in section 3.2 of the IBM Redbook [1] "Hybrid Analytics Solution using IBM DB2 Analytics Accelerator for z/OS V3.1"
[1]. The complexity of the load process and the implicit interactions of the involved components may cause situations where the system is over- utilized/overloaded, leading to a performance-degradation. Or the system may be under-utilized, resulting in longer elapse time of the ETL process. Additional workload running on the accelerator (e.g. from other connected DB2 z/OS systems sharing the accelerator) is also not considered.

2 Solution Overview

The proposed solution consists of logic on the source and target systems of the ETL stage that are connected through a dedicated communication channel. This communication channel is used for control information and feedback information. It is the core of the invention, on which the below described functionality builds.

Both sides (source and target) measure various utilization parameters and indicators on the extract, transform, and load operations. A highly utilized system communicates over the feedback connection with its peer to throttle its work if necessary. Similarly, if free resources are still available on all involved components (source system, communication channel, transformation component, and ta...