Browse Prior Art Database

Program for Determining Text Updates

IP.com Disclosure Number: IPCOM000084434D
Original Publication Date: 1975-Nov-01
Included in the Prior Art Database: 2005-Mar-02
Document File: 2 page(s) / 14K

Publishing Venue

IBM

Related People

Hopper, TR: AUTHOR [+3]

Abstract

The following describes a program which determines the differences between an "old" and a "new" version of a data set. The program determines a (heuristically) minimum number of deletions and additions of records which would convert the old data set into the new data set.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 54% of the total text.

Page 1 of 2

Program for Determining Text Updates

The following describes a program which determines the differences between an "old" and a "new" version of a data set. The program determines a (heuristically) minimum number of deletions and additions of records which would convert the old data set into the new data set.

The program compares large data sets. A primary objective is to minimize data movement and, consequently, running time. The program can create summaries of changes to source code, at preset checkpoints in the development and testing cycle of applications programs.

In the description that follows, capitalized names refer to variables. Specifically, NEWFILE is the file name for the "new" (updated) data set, and OLDFILE is the file name for the "old" (original) data set.

Central to the program is a FIFO (first-in/first-out) stack called OLD. The user specifies the size of OLD, and this value is stored in NMAX.

The user specifies:
A. Which positions (columns) of the records are to be

used when making comparisons.
B. How far (NAHEAD) should the program look ahead for

multiple successful compares, before a match can be

considered to have occurred.

The following description applies to the cases where an "end-of-file" is not encountered when reading NEWFILE or OLDFILE. The variation to the processing algorithms for the cases when an end-of-file is encountered is straightforward.

To start processing, the stack OLD is filled with records from OLD-FILE, and a record from NEWFILE is read into NEWLINE. The stack OLD is considered to wraparound, and its current origin is stored in NEWSTART. As records from OLD are processed, new records are read in, from OLDFILE, into the positions occupied by the processed records, and a new value is stored in NEWSTART. Extensive use of the PL/I "MOD" function is made to accomplish the processing of OLD as "wraparound" storage.

After a record is read into NEWLINE from NEWFILE, a search is made through the records stored in OLD (the search starts in position NEWSTART of OLD) to determine where there exists matches for NEWLINE (using only the columns specified by the user). The following conditions are now cons...