Browse Prior Art Database

A method for identifying changes in a massive database that may be described as a dynamic, directed acyclic graph (DAG).

IP.com Disclosure Number: IPCOM000029111D
Original Publication Date: 2004-Jun-16
Included in the Prior Art Database: 2004-Jun-16
Document File: 4 page(s) / 16K

Publishing Venue

IBM

Abstract

Described is a method for storing tree structures that can be expressed as DAGs that: (1) reduces storage requirements; (2) facilitates fast identification of nodes (locations) where changes have been made during an update; and (3) only allows users with appropriate levels of authorization to view database contants, as well as node (e.g., document category) names.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 27% of the total text.

Page 1 of 4

A method for identifying changes in a massive database that may be described as a dynamic, directed acyclic graph (DAG).

A method for identifying changes in a massive database that may be described as a dynamic, directed acyclic graph (DAG).

by Mei Kobayashi, Tony Eng, Issei Yoshida

Described below is a method for storing tree structures that can be expressed as DAGs that: (1) reduces storage requirements;
(2) facilitates fast identification of nodes (locations) where changes have been made during an update; and (3) only allows users with appropriate levels of authorization to view database contants (as well as node (e.g., document category) names. --------------------------------------------------------------------------------------------------------------- Method for reducing storage requirements Assume that the contents of a massive database can be described as a DAG with many repeated sub-DAGS (hereafter referred to
as "repeated components"). The database can be stored in compact form by storing only one full component in the DAG and
replacing repeats with a pointer to the one, full component.

Identification of Repeated Components using One-Way Hash Functions
1. For each node of a DAG, store:

- the location of the information associated with the node

- children and parents of the node (if none, indicate it's a leaf or top node, respectively)

- an "identifying" code number for each leaf. Wwhenever the same leaf appears in a

graph, it should be assigned the same code number.
2. Select a function F with property that for any x belonging to a user-specified domain,

the function will output F(x) = y, belonging to a user-specified range. Furthermore, F

must have the property that for any x1 and x2 belonging to the domain such that x1

and x2 are not the same, F(x1) = y1 and F(x2) = y2 are not the same or are highly

unlikely to be the same. Examples of functions with these properties are:

- the XOR function (given on p. 655 in Cormen, Leiserson and Rivest 1990), where for

the leaf nodes only, the input and output strings are identical (or complements).

- one-way hash functions (see p. 429 in Schneier1996).

- MD2, MD4, MD5

- and others (given in Chapter 18 in Schneier 1996).

For each leaf node, input its identifying code number into F and store the output code

with the other annotations (given in step1, above).
3. For each non-leaf node, use set of output codes from all of its children as the input for

the function F.
4. Iterate the process described in step 3 (from the leaf nodes up to the apex node (top

node) of the graph to generate an output code for all nodes.
5. To locate repeated components, look for branches that have identical output codes.

For branches with identical codes, check to see that the branches and leaves below are

identical, i.e., check to make sure different branches and leaves did not happen to lead

to identical output codes by some highly unlikely coincidence.

There are countless possible minor variations of the abov...