Method to determine whether duplicate records exist when the entire space of records cannot be buffered.
Original Publication Date: 2002-May-21
Included in the Prior Art Database: 2003-Jun-20
Disclosed is an algorithm which determines whether there are duplicates in an array of records, where the array of records is so large that it cannot be contained in virtual memory. In z/OS* the PFTE data structure represents the attributes of a real storage frame and contains data that uniquely identifies the page that the frame is currently backing, if the frame is in use. It is a bug when two such PFTEs indicate that the frames they represent are backing the same page. RSMDATA* is a function of IPCS*, a dump analysis tool, which provides diagnostic information about real storage in a storage dump. Part of the function of RSMDATA is to determine whether there is an error with the PFTEs. In prior versions of the operating system, RSMDATA determines whether two frames are backing the same page by first quicksorting the PFTEs using a key composed of attributes of the virtual storage that the frame backs (e.g. the virtual storage address, the address space of the virtual storage, etc.). It then sequentially scans through the array of PFTEs, searching for two (or more) adjacent PFTEs that indicate their frames are backing the same page. The problem with this algorithm in a system with a very large amount of real storage is that the PFTEs can no longer be contained in a single buffer, so quicksort is no longer an option. Below is an algorithm that will catch duplicate records in an array such as the PFTE array, when the set of records is too large to be buffered. (1) Instead of obtaining a single large buffer to accomodate all of the data records, two smaller buffers, of equal size (enough to contain N records), are obtained (call them buffer A and buffer B ). (2) The N lowest records are read into buffer A . (3) The records of buffer A are then added to an open hash table; if in adding a record to the hash table, a match is found then 2 duplicate records have been found.