Browse Prior Art Database

Cache Organization to Maximize Fetch Bandwidth

IP.com Disclosure Number: IPCOM000035311D
Original Publication Date: 1989-Jul-01
Included in the Prior Art Database: 2005-Jan-28
Document File: 2 page(s) / 15K

Publishing Venue

IBM

Related People

Grohoski, GF: AUTHOR [+2]

Abstract

A common bottleneck in the performance of RISC processors is the instruction dispatch rate from the instruction fetching unit of the processor. This dispatch rate is directly dependent upon the instruction fetch rate from memory. A common solution to the memory fetch rate is the use of an instruction cache. However, when the processor can execute multiple instructions at once, extraction of instructions from the instruction cache can limit performance.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 50% of the total text.

Page 1 of 2

Cache Organization to Maximize Fetch Bandwidth

A common bottleneck in the performance of RISC processors is the instruction dispatch rate from the instruction fetching unit of the processor. This dispatch rate is directly dependent upon the instruction fetch rate from memory. A common solution to the memory fetch rate is the use of an instruction cache. However, when the processor can execute multiple instructions at once, extraction of instructions from the instruction cache can limit performance.

The technique described in this article allows up to 4 instructions to be fetched from the instruction cache each cycle. It also allows cache line crossings to occur within a group of 4 instructions as long as both cache lines are resident in the cache. If a group of 4 instructions crosses a page boundary, less than 4 instructions can be fetched. However, this technique can easily be extended to allow groups of 4 instructions to be fetched across page boundaries. It can easily be generalized to support the fetching of any number of instructions from the cache which is a power of two.

To reiterate, this technique allows 4 instructions to be fetched from the cache every cycle, regardless of the starting address of the group of four instructions, except that the group cannot span a page boundary.

This multiple-fetch technique will be illustrated for a 8KB, 2 way set associative cache with lines of 64 bytes in length. To accommodate the fetching of four instructions, the cache arrays are split into eight arrays. Each array is one word-wide and, therefore, a single cache line actually takes up four physical rows of each of four arrays (since the cache is 2 way set associative, 4 arrays are dedicated to each set). The cache line is put into the cache such that the first, second, third, and fourth words of the line are put into row i of the first, second, third, and fourth arrays, respectively. Similarly, the fifth, sixth, seventh, and eighth words are put into row i+1 of the four arrays, and so on for the remaining eight words of the cache line. Note also that if cache line j is in rows i through i+3, then cache line j+1 is in rows i+4 through i+7 unless cache line j+1 is the first line in a page in which case it wraps around to the beginning of the cache.

The cache directory associated with the cache has two sets of 64 entries representing sets A and B. Since the ability to fetch from two lines simultaneously is desired, the ability to check the validity of two cache lines is required. This is accomplished by breaking the two sets of 64 cache directory entries into four arrays of 32 entries each where all the even-numbered lines reside in two of the arrays (one for each set) and all of the odd numbered lines reside in the other two arrays. Given a particular cache line address which can be either even or odd, the next cache line will be odd or even, respectively. Note that if cache line j is odd, then cache line j+1 is even and the base cach...