Browse Prior Art Database

Method and system to provide levels of abstraction for ETL component types within a data flow job design Disclosure Number: IPCOM000247340D
Publication Date: 2016-Aug-25
Document File: 4 page(s) / 114K

Publishing Venue

The Prior Art Database


This describes a way to construct a data flow design model which allows the designer to view a flow at different levels of abstraction. This also lets a designer see how much of a design will be preserved if the implementation level is changed or a flow component substituted with another that is not completely compatible.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 40% of the total text.

Page 01 of 4

Method and system to provide levels of abstraction for ETL component types within a data flow job design

The Problem: How to Simplify Views of Complex Data Flows

    Data flow job designs, in the ETL space particularly, are often constructed by picking from a list of supplied component types to create a new instance of an operator or stage, setting various configuration properties relevant to the chosen type, and then linking them together to show how data is intended to flow between them.

    The design of such jobs can be extremely complex, and involve dozens if not hundreds of operations to describe the required transformation and processing. This can make it difficult to get an overall "feel" for what a flow is designed to do, as the high-level processing logic can be hidden in too much detail.

    It can also prove time-consuming to replace a component of a flow with another one that is of the same logical type, since the configuration properties of the old instance need to be mapped to those of the new instance and transferred across if there is a semantic match. You often need to know in advance what properties the mapping is going to preserve, and what will be discarded, to decide if the substitution is going to be of use.

Existing Solutions: Containment/Custom Mapping

    Most tools provide a way to group operations into some sort of container, or annotate them to show the higher-level purpose of the group. However it is up to the user to decide where such group boundaries should be.

    As for converting one instance into another, that can be expedited by providing a tool that knows how to map between the properties of two component types, and transfers configuration properties if possible. However this requires you to attempt the conversion, see what got transferred, and then presumably you can undo the change if you don't like the result.

The Proposal: Organize Available Flow Types into a Logical/Physical Hierarchy

    The proposed mechanism makes use of the general notion of sub-classing and inheritance from the object-oriented world, and applies it to the definition of the component types in this data flow design paradigm. Component types are organized into a parent-child hierarchy. Each type is defined as also marked as "logical" or "physical", where a logical type cannot have a physical parent.

    This allows you to view a higher level of abstraction of a flow by moving up the hierarchy in a "more logical" direction. Levels of detail can be successively hidden by requesting the system to view a flow showing not the actual types of each component, but one of its parent types.

    Conversely, one can design a flow using logical components and successively refine it be converting each logical component into one of its child types, until a physical level is reached. This allows a designer to lay out a flow at an abstract level, leaving some of the detail to be added later.

    It is also then possible to automatically group connected components which h...