Browse Prior Art Database

A Methodology for Sanitizing Database Information for Screen Captures and Demonstration Systems

IP.com Disclosure Number: IPCOM000031852D
Original Publication Date: 2004-Oct-14
Included in the Prior Art Database: 2004-Oct-14
Document File: 2 page(s) / 46K

Publishing Venue

IBM

Abstract

A program is disclosed that sanitizes sensitive network information contained in test DB2 databases. This methodology takes a snapshot of the data in a DB2 database, extracts the real system names and IP addresses from database tables, and substitutes fictitious names and demonstration IP addresses.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 2

A Methodology for Sanitizing Database Information for Screen Captures and Demonstration Systems

A program is disclosed that sanitizes sensitive network information contained in test DB2 databases. This methodology takes a snapshot of the data in a DB2 database, extracts the real system names and IP addresses from database tables, and substitutes fictitious names and demonstration IP addresses.

Contained in this document is a methodology for sanitizing this data in the database before it is made accessible. Fictitious system names were chosen to represent regions or function. Fictitious IP addresses were generated by the program in accordance with standards for demonstration purposes.

The methodology for extracting and sanitizing data from database tables is explained below. Note that this utility will only sanitize data for DB2 databases.

The DB2 data sanitization process consists of one JAVA class that requires four parameters to run successfully. The four parameters (in order) are database name, user name, password, and schema name.

Once the program is invoked with the appropriate parameters, a database connection is created and a list of tables for the schema provided is retrieved. From each of the tables in the list, a subset of data is retrieved and interrogated. Interrogation is the program's attempt to determine whether or not each column of each table in the list contains an IP address or a host name.

During interrogation, several assumptions are made. Among them are:
1. Hostnames have non-integer values in the first position and each position is separated by a period (abc.xyz.com versus 1.23.45.67)
2. IP addresses and host names do not contain hyphens
3. Version 4 IP addresses contain integers in the first position and are separated by periods
4. Version 6 IP addresses contain hex numbers in the first position a...