Browse Prior Art Database

Technique for Integrating File Versioning within an ETL Computer Program Disclosure Number: IPCOM000236973D
Publication Date: 2014-May-23
Document File: 4 page(s) / 29K

Publishing Venue

The Prior Art Database


A process and methodology to improve an Extract Transform Load (ETL) computer program product by integrating the concepts of file versioning in output file option handling, where the ETL program has control over the output file versions, providing increased efficiency and reduction in development costs over current alternatives.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 26% of the total text.

Page 01 of 4

Technique for Integrating File Versioning within an ETL Computer Program

In a database warehouse environment, the use of an ETL (Extract Transform Load) tool is frequently used for preparing files or loading tables for analysis and reporting. Currently, none of the the ETL tools allow for management of versions of the files to be stored. These files could be desired for audit purposes, debugging, or reruns if needed. In today's environment, custom code or additional user intervention is needed for every file that would be kept beyond the original.

The article proposes techniques for improvement to an ETL computer program product which integrates file versioning into output file option handling.

The article proposes techniques to add capabilities to ETL computer program where it can identify existing file versions in the target file system and create a new version of the file without overwriting the existing file or without aborting the ETL processing if the file in subject already exists on the target system. Also add capabilities where the ETL program can control number of file versions to maintain and actions to take when it meets the maximum number of file versions mentioned through the ETL program.

In todays scenario additional user programming or manual user intervention is needed to achieve the above mentioned in an ETL processing system. This solution/technique when incorporated to the ETL computer program eliminates additional user programming or manual user intervention to achieve the above requirement which could be a common business need.

Following are major features of the proposed solution.

An update to an ETL computer program

Where options are added to the ETL program to provide versions of flat file outputs
Where the outputs versions can be named sequentially or by timestamp
And the options to maintain those versions in the ETL program can be increased or decreased And the ETL program reads the target directory to find existing files to determine which, if any files need to be deleted.

Following details would be supplied through the ETL computer program - the base file name, maximum number of versions the target file can have, what action need to be taken if maximum number of versions required criteria is met and what is the versioning method that need to be used - like just append the numbers at the end of the base file name or append a timestamp at the end of the base file name. Software changes to the ETL computer program would be required so that programmer can pass the above values in the ETL process.A standard inbuilt routine should be designed/coded and would be part of the ETL software. This routine would be some shell script or some other scripting language that the ETL software uses for other existing functionality/inbuilt routines. The basic functionality of this inbuilt routine / shell script would be to read the file names in the target file system based on parameter values provided by the ETL