Browse Prior Art Database

Method for paring archived data to allow for smooth depreciation

IP.com Disclosure Number: IPCOM000013922D
Original Publication Date: 2001-Jan-01
Included in the Prior Art Database: 2003-Jun-19
Document File: 2 page(s) / 43K

Publishing Venue

IBM

Abstract

Disclosed is a method for selecting time-sequenced archive data for replacement with newer data. The method allows for discounting in the same sense as financial depreciation, in order to favor more recent data, while systematically saving diminishing samples of older data. Time resolution of the data points is reduced according to the age of the data. The resulting non-uniform time resolution allows the archive to span more total time, without sacrificing high-resolution coverage of recent results.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 2

Method for paring archived data to allow for smooth depreciation

    Disclosed is a method for selecting time-sequenced archive data for replacement with newer data. The method allows for discounting in the same sense as financial depreciation, in order to favor more recent data, while systematically saving diminishing samples of older data. Time resolution of the data points is reduced according to the age of the data. The resulting non-uniform time resolution allows the archive to span more total time, without sacrificing high-resolution coverage of recent results.

This disclosure is concerned with time-sequenced archive data. This is data that becomes available at regular intervals and which is saved for an extended time, considerably longer than the availability interval. For example, a test might be run daily, and the results saved for years. Unless storage is unlimited, eventually it will be necessary to delete some old results in order to make room for newer results. Deletion may also be desirable to improve searching efficiency or for other reasons.

A decision to make, then, is which data to delete? A reasonable approach is to delete the oldest data -- first in first out. However, for some data this is not the best approach. While the more recent data is almost always going to be considered the most important, it may be better to have access to data spanning a longer period of time than to have access to the same amount of data spanning a shorter time in more detail. For example, archived daily test data might be examined to determine when certain tests began to fail. If the tests began to fail more days ago than the size of the archive, it will only be possible to place a lower bound on the time since the last successful run (i.e. a lower bound equal to the size of the archive). If instead some of the data of intermediate age had been discarded to make room for new data, the time span covered by the archive could have been much longer. This would improve the likelihood that both an upper and lower bound could be placed on the time since last success, and failing that at least the lower bound would be greater.

To extend the age of the oldest archive data without using more storage, time resolution must be traded for time span. For example, one might save only test results from every third day. But generally it will be desirable to keep as much resolution as possible in the recent data. As the data gets older, lower time resolution becomes appropriate. So some algorithm is needed to choose which data to discard, and not always the oldest. Let N = Total number of archived records to be retained P0 = Number of archived records to retain over maximum-resolution span (The archive is to always have the P0 most recent records)

Partition N into subsizes P0, P1, P2, ... so that P0 + P1 + P2 + ... = N. The sizes of the various Pi reflect the value placed on resolution of different ages; generally one would expect N > P0 >= P1 >= P2 >= ....