Browse Prior Art Database

A system and method for improving data migration

IP.com Disclosure Number: IPCOM000185197D
Original Publication Date: 2009-Jul-15
Included in the Prior Art Database: 2009-Jul-15
Document File: 7 page(s) / 99K

Publishing Venue

IBM

Abstract

Currently, data migration projects are labor intensive, long running and expensive. The idea proposed in this invention disclosure allows accelerating such projects while still being able to plug in data quality enhancement functionalities embedded in a sophisticated data migration methodology. At the same time, we can reduce the laborious part of the work and achieve the same result with less people than before. Please note that our approach is applicable to migration, consolidation, harmonization, and implementation projects as well. The objective of this invention disclosure is to accelerate data migration projects by minimizing repetitive, labor-intensive, yet simple tasks. We generate the corresponding artifacts based on metadata extracted from source and target systems of the data migration projects instead of manually developing those artifacts.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 20% of the total text.

Page 1 of 7

A system and method for improving data migration

Background

On a more technical level, data migration is the problem of transporting the application data of one or more source systems into a target system. Source and target systems may use different data structures to represent the same type of data,
i.e., attributes or fields may have different names (e.g., NAME1 and LASTNAME), different split-up of data (e.g., STREETNAME and HOUSENUM in two separate fields vs. ADDRESSLINE1 in a single field), or different grouping of fields in a container data structure (e.g., having name and address in the same data structure vs. separating both into different data structures), to name just a few. These types of structural heterogeneities are commonly addressed by transformation logic that is developed in the scope of the data migration project. Furthermore, also the data values may require translations (e.g., the marital status of married is represented as "marr." in one system and as "01" in another system.). These types of semantical heterogeneities are commonly addressed by mapping tables and transformation rules that are developed in the scope of the data migration project. The definitions of the container data structures and the fields are called metadata.

Currently, the following three approaches are employed in data migration projects:

The most common approach in data migration projects is to manually develop all artifacts within the scope of the data migration project. This involves lots of manual work, typically based on specifications in the form of spreadsheets or text documents. A human person needs transform those specifications into appropriate artifacts performing the actual data movement and data transformation.

This can be accelerated by creating data migration templates upfront, consisting of a blueprint version of all artifacts required for performing the data migration. This requires advance knowledge of the source and target systems, and the systems used for blueprinting need to be reasonably similar to the actual systems used in the data migration project. Still, due to client-specific data structures, client-specific business processes, and client-specific data, these blueprint artifacts need to be manually adapted within the scope of the data migration project. Due to the generic nature of the blueprint artifacts, they tend to be more complex than the all-manually-built client-specific artifacts. Hence, their adaptation is similarly complex compared to creating client-specific artifacts from scratch.

There exist approaches that generate data movement and data transformation artifacts from manually specified but machine readable mapping specifications. However, those approaches address only subparts of the entire data migration task. They do not directly connect to the source or target systems, but ope...