Browse Prior Art Database

Fixed Time Cost Management of a Sorted Group of Objects in Storage

IP.com Disclosure Number: IPCOM000020395D
Original Publication Date: 2003-Nov-19
Included in the Prior Art Database: 2003-Nov-19
Document File: 5 page(s) / 67K

Publishing Venue

IBM

Abstract

The Fundamental Problem The time cost to manage a sorted list containing N objects is defines as the cost of inserting the N+1 object, or the cost of finding one. The theoretic mean cost is in the order of Log(N) denoted in the literature as O(Log(N)). The total cost of building this list, or of accessing all its objects based on this cost model is therefore O(N*Log(N)). If the list can only be accessed sequentially (i.e., no direct access to the Nth element), than the cost per object is O(N), and the total cost is O(N2). Direct access to the items is possible if they are of a fixed size, or there is a management structure consisting of fixed size pointers to them. When the list consists of variable size objects, with very large N, compounded into a larger object, this management problem becomes critical to the system performance. Objects in storage systems bear a large cost tag per operation, specifically when the lists and/or objects are large and require significant I/O penalties.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 20% of the total text.

Page 1 of 5

Fixed Time Cost Management of a Sorted Group of Objects in Storage

The Fundamental Problem The time cost to manage a sorted list containing N objects is defines as the cost of inserting the N+1 object, or the cost of finding one. The theoretic mean cost is in the order of Log(N) denoted in the literature as O(Log(N)). The total cost of building this list, or of accessing all its objects based on this cost model is therefore O(N*Log(N)). If the list can only be accessed sequentially (i.e., no direct access to the Nth element), than the cost per object is O(N), and the total cost is O(N2). Direct access to the items is possible if they are of a fixed size, or there is a management structure consisting of fixed size pointers to them. When the list consists of variable size objects, with very large N, compounded into a larger object, this management problem becomes critical to the system performance. Objects in storage systems bear a large cost tag per operation, specifically when the lists and/or objects are large and require significant I/O penalties.

The Fundamental Solution The solution presented here can be termed "Cached list" whereby in addition to the list, we also maintain a small cache of size M where M << N (i.e., M is much smaller than N). The cost management of the cached list per operation, compared with the original list, drops from the theoretic O(N) or O(Log(N)) to O(M) or O(Log(M)). Since M is small and for all practical purposes can be considered fixed, we have revolutionized the management time per operation to a fix, and denote it O(1). The total management time is than O(N).

    Caches are used for many purposes in data processing, specifically when mediating between fast and slow storage media. The secret of caches is in their hit rate, which depends on the access patterns to the data. In our solution, if the hit rate 2is (for instance) 99%, than the actual cost is O(N*0.01 + N * 0.99). Caches tend to produce such high hit rates.

    We define cache hit if the cache contains reference to the object being searched, or the object immediately preceding it. In such cases, and in case of sequential access pattern we reach a hit rate of 100%, and our outstanding claim above. In general, the cache will help the search for the proper object, or the insertion point of a new object by permitting to start the search from a close by point and not from the beginning of the list.

Basic Description The cache is a small collection of hints. A hint is a reference to an object in the list. In fact, for the sequential access pattern, even a single hint can make a big difference. The hint contents may be a simple as a mere pointer to the object location in the list, which has to be accessed in order to execute the comparison in the search criteria. A better hint will hold also the comparison parameters or fields from the real object and save access time to intermediate objects.

    To describe the algorithm, we will use the insertion solution...