Browse Prior Art Database

Dual On-Chip Instruction Cache Organization in High Speed Processors

IP.com Disclosure Number: IPCOM000114337D
Original Publication Date: 1994-Dec-01
Included in the Prior Art Database: 2005-Mar-28
Document File: 2 page(s) / 109K

Publishing Venue

IBM

Related People

Song, SP: AUTHOR

Abstract

Described is an improved design of two on-chip instruction caches with different organizations to provide single cycle instruction access as well as maintain high cache hit rate for high speed processors. The first instruction cache is kept small enough and organized in direct-mapped to provide single cycle access. The second cache is made as large as design permits and organized in multiple sets to minimize sets to minimize cache miss rate, at the expense of taking multiple cycles to access. Unlike other dual cache organizations where the smaller cache contains a subset of the larger cache (hierarchical organization), the two caches are organized at the same level to improve overall cache hit rate.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 48% of the total text.

Dual On-Chip Instruction Cache Organization in High Speed Processors

      Described is an improved design of two on-chip instruction
caches with different organizations to provide single cycle
instruction access as well as maintain high cache hit rate for high
speed processors.  The first instruction cache is kept small enough
and organized in direct-mapped to provide single cycle access.  The
second cache is made as large as design permits and organized in
multiple sets to minimize sets to minimize cache miss rate, at the
expense of taking multiple cycles to access.  Unlike other dual cache
organizations where the smaller cache contains a subset of the larger
cache (hierarchical organization), the two caches are organized at
the same level to improve overall cache hit rate.

      High speed, high performance processors use an on-chip
instruction cache to supply several instructions per cycle for
processing.  An on-chip cache is necessary since several instructions
are accessed per cycle at rates many times faster than off-chip
memory access speed.  An off-chip cache solution requires a wide
interface between the processor and cache/memory at the processor
speed.

      It is desirable to make the size of an on-chip cache as large
as design permits to minimize the frequency of off-chip access.  The
on-chip cache is usually organized in multiple sets since this
organization yields higher cache hit rates than a direct-mapped cache
of same size.  The on-chip cache is also indexed with a physical
address, rather than an effective or virtual address to avoid the
need to flush its contents when the effective address to the physical
address mapping changes on every context switch.

      A physically indexed on-chip cache larger than the size of a
page makes cache access slow since the cache hit is determined by
comparing the addresses of the fetch address and several cache line
addresses.  The result of the compares are used to select one of
several cache lines.  Currently two are used to solve this problem.
For the discussion, assume that the cache size is 16K bytes and the
page size is 4K bytes.  Also assume that the memory management does
not use segments so that effective and virtual addresses are
identical.

      One solution is to organize the cache in 4 sets of 4K bytes
each.  This way, the cache access is made with lower 12 bits (less
the number of bits for selecting bytes within a cache line), which
are same for both virtual and physical addresses for an access.  The
virtual instruction address is translated at the same time as the
cache access such that the real instruction address is available by
the time cache access is finished.  The cache access yields 4
possible lines, one from each set, for cache hit.  The four physical
address tags (20 bits for each) are compared against the translated
real instruction address (upper 20 bits).  If a match is found in one
of the four sets, the cache access resu...