Browse Prior Art Database

A Caching Strategy to Minimize Stored Data

IP.com Disclosure Number: IPCOM000012756D
Original Publication Date: 2003-May-27
Included in the Prior Art Database: 2003-May-27
Document File: 1 page(s) / 44K

Publishing Venue

IBM

Abstract

Disclosed is an algorithm that describes an efficient strategy for caching data in a fashion that minimizes the stored data. This caching strategy is most applicable to trace tools but the methodology described could be generalized to a variety of applications.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 68% of the total text.

Page 1 of 1

A Caching Strategy to Minimize Stored Data

Computer systems often employ a trace facility to collect data for a variety of purposes, typically information gathering, data analysis and/or for debug reasons. System traces are notorious for collecting large quantities of data such that the sheer size of the associated files often inhibits the use of these tools.

Some trace data streams are characterized by having a significant amount of recurring data blocks. One example of this is a graphics data stream. During an animation sequence the same data is rendered multiple times with only small changes to the view transformations. So by eliminating the storage of redundant data, the overall amount of data storage required for a given data stream can be significantly reduced. By reducing the storage requirement, traces can be created for much larger and more complex data streams.

The general strategy used by most trace tools would be to receive or intercept a data record which is then written to some type of buffer or file. The proposed invention consists of calculating a checksum for the data record, then using the checksum as a key for indexing into a cache of data records using some type of look up method, such as a hash table or binary tree.

The general process would be to calculate a checksum for each data record as it is received. The calculated checksum is then used to determine if the data record already exists in the cache or not. If the data record exists then a...