Browse Prior Art Database

Cache Line Replacement Selection using a Logical Multi-Way Tree with Access Order States Maintained at Each Node

IP.com Disclosure Number: IPCOM000030586D
Original Publication Date: 2004-Aug-18
Included in the Prior Art Database: 2004-Aug-18
Document File: 3 page(s) / 55K

Publishing Venue

IBM

Abstract

An improved method for cache line replacement selection in a set-associative cache is disclosed. In this method, a line is selected for replacement within a set in an N-way set associative cache using a logical multi-way tree, where the leaf nodes of the tree represent lines in a set, and where if the branching factor of a given node in the logical tree at a given level is k, then the state of the node is represented by one of k! (k factorial) logical states representing an access order of the nodes below the given node in the logical tree.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 51% of the total text.

Page 1 of 3

Cache Line Replacement Selection using a Logical Multi-Way Tree with Access Order States Maintained at Each Node

Consider an N-way set associative cache. For small N (e.g. N=4), it is feasible to select the LRU (least recently used) line for replacement within a set since maintaining the access order does not require a large number of states (for N=4 there are 4! = 24 states, and a state machine can be used, with 5 bit state numbers; there are alternative implementations as well for small N). However, in newer processor architectures, it may become desirable for a variety of reasons (for example multi-threaded processor designs) to increase the cache associativity at various levels of the cache hierarchy. For larger values of N (e.g., N=8, 16), maintaining the full access ordering becomes impractical since the number of states increases exponentially (for example, 8! = 40320; 16! = 20922789888000). A well-known solution is to approximate LRU using for example the TreeLRU replacement selection method (also known as pseudo-LRU or pLRU). In this method, one bit indicates whether the most recent reference is to a line in either the first half or the second half of the lines in a set; then, this technique is logically applied recursively, resulting in a logical binary tree with N-1 nodes (thus N-1 bits are required to represent a state in TreeLRU). TreeLRU does not approximate LRU particularly well, as shown by the following graph for the case N=16. These results were generated as follows: 100,000 random permuations of 0,1,2,3,...,15 were generated; for each permutation, the lines in a set were accessed in this order, generating a particular TreeLRU state; finally, the line selected for replacement using TreeLRU was found, it's position in the full LRU ordering of the set was found (where 0 was the LRU line, 1 was the 2nd LRU line, etc.), and the statistics were updated.

For comparison, if the LRU line had always been selected, the graph would have a

  TreeLRU Replacement Statistics 16-Way (100,000 random permutations)

25

20

15

10

5

0

Thousands

Number Selected

0

2

4

6

8

1

3

5

7

9

10

12

14

15

11

13

LRU Position (0=LRU)

1

Page 2 of 3

single bar at LRU position 0 of height 100,000.

Although it is true that the probability of re-reference of a line in a set (where the probability is computed over all sets in the cache) decreases sharply with increasing age of the line within the set, it is also the case that a certain small fraction of sets are typically "hot" and generate a proportionately large fraction of all cache misses. Therefore one would expect that a method which approximates LRU more closely could give non-trivial performance improvements (in terms of decreasing cache misses for the "hot" sets). However an improved method is only practical if it can be implemented efficiently. A method is disclosed here that approximates LRU much more closely than TreeLRU, with minimal increase in complexity, and that has an efficient implementatio...