Browse Prior Art Database

Method and System for Automatically Creating Static Data Lineage

IP.com Disclosure Number: IPCOM000190233D
Original Publication Date: 2009-Nov-23
Included in the Prior Art Database: 2009-Nov-23
Document File: 5 page(s) / 55K

Publishing Venue

IBM

Abstract

A method and system for automatically creating static data lineage. This enables seamless creation of reports for static data lineage, which may be developed based on demands of a user.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 26% of the total text.

Page 1 of 5

Method and System for Automatically Creating Static Data Lineage

Static data linage is an ability to follow a data element (column/field) from its first appearance (initial source) in a system, through to its ultimate destination (final target). In order to create static data lineage, this method traces data stage processes, projects, third party business intelligence tools and data modeling tools. This is accomplished by tracing the data using an Extract, Transform, and Loading (ETL) tool. Subsequently, a report of static data lineage is generated which includes information such as the originating source of data in an enterprise, the process involved in transforming the data, the target destination of the data and details about how the data will be consumed.

In order to trace the data element, a design time metadata is utilized to retrieve information about the initial source and the final target. The Initial source can be inception of a column itself, for example, creation of a data element using a data modeling tool.

Alternatively, the initial source can be any other point in the

                                                flow before the final target, such as a staging area. The final target is a point where the column ends its flow.

To obtain a complete path of the data, physical locations of the intermediate sources and targets are identified and linked. Thus, the lowest level movement of the data is traced to obtain individual mapping sets of the data. The lowest level mapping set of the data is termed an anatomic mapping. Such anatomic mappings are created for each stage of a job design involving movement of the data.

Every stage of the job has input links, output links or both input and output links. Stages with only output links are source stages; stages with only input links are target stages; and stages with both input and output links are flow through stages. Each anatomic mapping may consist of several source anatomic elements but only one target anatomic element. The source anatomic element and the target anatomic elements are linked together to form a chain of anatomic mapping elements that can be followed from initial source to final target. The linking is performed by connecting the respective path and element names of the source anatomic elements and the target anatomic elements.

An example illustrating a job design with three stages is shown in Fig. 1.

1

[This page contains 1 picture or other non-text object]

Page 2 of 5

Fig. 1

In this example, a "Customer" table is accessed in the Open Database Connectivity (ODBC) stage and four columns from the customer table are selected: CustFirstName, CustLastName, CustActiveFlag, and CustZipCode.

A transformer stage allows only

active customers to be written to the target file (CustActiveFlag = "Y"); creates one field for first and last names (CustFirstName :" ": CustLastName); and flows through...