Browse Prior Art Database

System and Method for Purging Data Records in Large Databases

IP.com Disclosure Number: IPCOM000125486D
Original Publication Date: 2005-Jun-03
Included in the Prior Art Database: 2005-Jun-03
Document File: 5 page(s) / 68K

Publishing Venue

IBM

Abstract

Many backend applications keep a database representing the records being processed by the application, such as retail order database for an online store, or an event database for an event management solution.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 42% of the total text.

Page 1 of 5

System and Method for Purging Data Records in Large Databases

The idea is two use a data separation model where there is "current" data and "old" data The separation between these sets is defined by a system administrator, not by the application that manages the data.

The application always writes new data to the "current" set of tables but performs queries on both "current" and "old" data.

Archiving and purge can only be executed on "old" data.

The core idea is to separate the data in two conceptual blocks: one for data creation ("current") and another for archival/purge operations ("old")

The advantages are:

Performance for data creation is not affected


1.


2.


3.


4.


5.

System structure

The application has logic that addresses two sets of tables (the word 'set' is used to represent the fact that multiple tables may be required to represent the application records) on a Relational Database Management System (RDBMS), such as DB2 or Oracle. One is called "A", the other is called "B". During runtime, the application can determine which one "active" and which one has been "staged" for archiving and purge.

The architecture can be depicted in Figure 1.

Figure 1 - System structure for the data management

Performance for queries is affected in a predictable manner (there are always only two sets of tables

to be searched)

Performance for archiving is optimal as it is not done concurrently with insertion

Performance for purging is optimal as the set of "old" data can be deleted with RDBMS utilities

instead of SQL DELETE statements.

The administrator is in control of the policy for data archival and purging, not the application logic.

RDBMS

"A" tables

New data

Query

Query

"B" tables

Application runtime

swaps

The "application runtime" is responsible for handling requests for data creation and for data searches.

"A" tables and "B" tables can assume the status of "active" and "staged" during runtime. "Both "A and "B" tables share the same data structure, such as table names, column names, indexes, and others.

"Staged" and "active" are mutually exclusive states, in other words, when "A" is holding active data, "B" us

Staging utility

Extract Delete

Archiving utility

Purging utility

1

[This page contains 6 pictures or other non-text objects]

Page 2 of 5

holding staged data; and vice-versa.

"Active" tables are used to store the new data arriving in the system. The data in the "active" tables is not accessed by archival and purging utilities.

"Staged" tables contain data that can be archived and purged at some point in time. New data is never written to the "staged" tables.

The "archiving" utility uses RDBMS native commands to copy the contents of the set of tables ("A" or "B") currently marked as "staged" to another location (tape for instance)

The "purging" utility uses RDBMS native commands to quickly erase all contents of the "staged" tables. This command typically locks the tables for the brief duration of the operation.

The "staging" utility sw...