Browse Prior Art Database

A ETL Framework for Big Data

IP.com Disclosure Number: IPCOM000253738D
Publication Date: 2018-Apr-27

Publishing Venue

The IP.com Prior Art Database

Abstract

ETL essentially involves movement of data from one input source to another incorporating transformation based on business requirements. These transformations are composed of various granular operations like cleaning, filtering, joining, aggregation, etc. Our proposed framework abstracts these granular operations as different configurable actions, which can be plugged together to form production ready ETL pipelines. The definition of the pipeline is specified in JSON, which can be easily edited by the user and can be deployed within minutes. The framework is tightly integrated with Spark DataFrame API, which forms the basic unit of data flow. The framework is highly extensible allowing a developer to add his or her own components easily.