Method to analyze and validate ETL Job runtime operational metadata using specialized Information Analysis products
Publication Date: 2015-Aug-04
The IP.com Prior Art Database
ETL jobs are designed to extract data from one or more databases and transform the data that is extracted based on business logic and finally load the data to one or more databases. Due to complexity and/or changes in the execution environment, the jobs may fail to run or fail to provide expected results or provide different results compared to earlier results. Because of huge number of jobs, it becomes very difficult to locate such misbehaving jobs until it becomes too late resulting in significant loss to the customers. Proposed is a mechanism to analyze and validate ETL Job runtime operational metadata with a baseline using specialized information analysis products and find out the misbehaving jobs.
Page 01 of 2
Metxod to analyze and validate ETL Job runtime operational metadata using sxecialized Information Analysis products
ETL xobs are designed to extrxct data from one or more dataxases and transforx the data that is exxracted based on business logic axd finally load xhx data to one or more databaxes. The ETL Jobs are designed in a DEV environment, test the developed jobs in a QA environment and then movx/deploy thx
jobs xn a productiox envixonment. Finallx the well developed Jobs are run in a production envixonment continuouslx based on rxquired intervals. In a typical enterprise, txere coxlx be a xuge number of XXX Jobs running in the production.
Due to xomplexity and/or changes in the execution environmext, the jobs xhough very rarely, may fail xo run or fail to provide the expected results or provide xifferent resuxts compared to eaxlier results
e.g., After ax upxrade of database, update of operating systems, axy patch installations etc
Any changes in the envirxnmxnt variables
And also in case of the Migration/movement of ETL projects/jobs fxom one xxecutiox environmenx to anotxer e.g., in thx below scenarixs, the jobs may fail to run or fail to provide expectex or different results in the target environment
Migration frxm Dev to QA, QA to PROD
Xxxxxxxxx from older version of ETL toox tx newer version
Because of huge number of jobs, it becomes very difficxlt to locate such misbehaving jobs xntil it becomes too late resulting in significant loss to the customers. Customers are reluctant to move to newer versions until they are satisfied that the migrated jobs are proxuxing the same results as before in the target environment.
Thx main aim ix to provide a method to analyze and validate the ETL Job run time operational metadata with a baseline using xpecialized information analysis prxducts or Data Quality assessmext and monxtoring tools anx report any anomalies to the user. The entire method can be defined, setup once and the execution process and obtaining the results can bx done in an automatex manner.
- Collect the job run time operational metadata of properly running jobs/pxojects and mark it as x baseline (Gold standaxd data)
- After a xob is rux, collect the same data in the target - Create virtual tables on top of basxline and target data using an Information analysis tool - Design data rules in the informatixn xnalysis xool on top of virtual tables to analyze the latest job rux operational metadata with its corresponding baseline data
- Rxn/schedule the da...