Browse Prior Art Database

Method and System for Improving Extract, Transform, and Load (ETL) Job Run Time by Moving Out the Job Initialization Time

IP.com Disclosure Number: IPCOM000201542D
Publication Date: 2010-Nov-15
Document File: 2 page(s) / 48K

Publishing Venue

The IP.com Prior Art Database

Abstract

A method and system for improving Extract, Transform and Load (ETL) job run time by moving out the job initialization time is disclosed. The method includes composing a score for each job and broadcasting the composed score to all the nodes based on the configuration file of the job.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 01 of 2

Method and System for Improving Extract , Transform, and Load (ETL) Job Run Time by Moving Out the Job Initialization Time

Disclosed is a method and system for improving ETL job run time by moving out the job initialization time. A service is provided for composing a score for each job by utilizing idle cycles of the system. The composed score is broadcasted to all the nodes in the system based on configuration file of the job.

Jobs such as, DataStage*

time environment after fair amount of testing. A job scheduler is utilized to schedule the deployed jobs at run time in the system. Typically, score associated with execution of DataStage jobs is composed at the job run time. Inherently, the score provides parallel executing jobs with information about "what to execute and where to execute". The score may also include information about data partitioning. Further, the transmission of score to nodes in the system and the building and transfer of transform operator libraries is also done at the job run time.

The method and system disclosed herein provides a service/daemon for improving run time of a job such as an ETL job. The jobs deployed are stored in a repository and include information about the configuration file in terms of an environment variable. For example, APT

_CONFIG

the environment variables is not available for certain jobs then a default configuration file is provided for such jobs.

A service/daemon examines each job in the repository and composes a score for each

job. The service/daemon composes the score by utilizing the idle cycles of the system.

Thus, the score composition runs as a low priority process. After th...