Browse Prior Art Database

Data Validation and Delta Processing Procedure

IP.com Disclosure Number: IPCOM000126929D
Original Publication Date: 2005-Aug-12
Included in the Prior Art Database: 2005-Aug-12
Document File: 6 page(s) / 93K

Publishing Venue

IBM

Abstract

Disclosed is a framework for efficiently and effectively performing data validation and delta processing on full data feed files from a third party application.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 34% of the total text.

Page 1 of 6

Data Validation and Delta Processing Procedure

This framework was devised to replace a manual process to validate and calculate deltas on a weekly data feed file from an order management system. The manual process entailed the use of operating system commands, scripts, editors, and visual comparisons. Data validation is necessary to remove corrupt and invalid data. Delta calculation is necessary to determine what had changed in the order management system since the last data feed file was last processed. The deltas are necessary for a downstream billing application.

There are numerous known solutions. For example, manually process the data feed files using operating system commands, scripts, spreadsheets, macros, and visual comparisons. The drawbacks of the know solutions are as follows: labor intensive, time consuming, prone to errors, lack of metrics.

The core idea behind the framework is to replace a manual process of validating and calculating deltas on a full data feed file. It works by (1) loading the data feed file into a relational table, (2) running SQL against the table contents to validate the data, and (3) running SQL against the table contents to calculate deltas between the current data feed file and the previous data feed file.

There are four key advantages of using this framework instead of the manual solution.

Automated: Process lends itself to automation. An automated process can reduce

operating costs.

Quick: Results are produced in seconds/minutes instead of hours/days.

Error free and accurate: No question as to the accuracy and reliability of the results.

Procedures are performed programmatically using a relational database (RDB) engine and without human intervention.

Supports metrics and audibility: Stakeholders can query the tables for validation

errors, deltas, and audit information (ex. name of the data feed file processed, when a data feed file was processed, how long it took to process a data feed file).

This framework can be used for any process requiring data validation and delta processing of a full data feed file.

Process Preparation:

Build the following three tables in a RDB.

Table Name: JOB_LOG

Table Description: This table holds information about the job that is used to process the data feed file. For each data feed file processed, one record will be inserted into this table.

Column Name: Description: Data Type:


1.


2.


3.


4.

1

Page 2 of 6

The unique identifier for the job responsible for processing the data validation and delta processing. As new jobs are added, this number is increased sequentially. Specifically, it will start at 0, then go to 1, then 2, then 3, and so on.

The unique identifier for the previous job the current job used for comparisons during delta processing. This value is always the value of JOB_ID minus 1, except when JOB_ID is 0.

JOB_ID

JOB_ID_COMP

FILE_NAME The name of the file containing the data feed file.

Character 100

START_TMSTMP The date and time the job began. Timestamp...