Cache Allocation Scheme to Conserve Bus Bandwidth for Use with a DEMI Cache
Original Publication Date: 1994-Jun-01
Included in the Prior Art Database: 2005-Mar-27
Gregoire, DG: AUTHOR [+1]
This article describes a scheme employed to preserve memory bus bandwidth while maintaining coherency for a Dirty/Exclusive/Modified/Invalid (DEMI) cache protocol. A second objective of this scheme was to preserve design simplicity.
Cache Allocation Scheme to Conserve Bus Bandwidth for Use
describes a scheme employed to preserve memory bus
bandwidth while maintaining coherency for a
Dirty/Exclusive/Modified/Invalid (DEMI) cache protocol. A second
objective of this scheme was to preserve design simplicity.
In a PowerPC*
601-based system, the memory bus operations are
bursts, and coherency usually is based on 32-byte units. Therefore,
it is simplest to work in 32-byte units. However, this may be
inefficient with respect to the actual data transferred. The number
of 32-byte bursts executed should be minimized by the cache
allocation scheme. Additionally, shifts to another memory page are
more expensive because the IO controller more expensive because the
IO controller maintains cache page allocation information
(Translation Control Entry (TCE)). This information will reside in a
different space from most data and thus disrupt the memory
controller's prefetch pipeline.
allocation scheme employed attempts to locate a cache
set which had data from the same page as the desired access first.
This would save the effort to fetch the TCE and potentially prevent
disruption of the memory controller's data pipeline. If multiple
page matches are detected as a side effect of displacing dirty data,
an arbitrary cache set is chosen if all have the same status. (All
are invalid because in a multiple match situation, all but one cache
set must be invalid or all are invalid.) If one can be serviced with
potentially less bus traffic, it is chosen.
If no page
match was found, then an unused (invalid) cache set
is chosen, if available. If all cache sets are in use, then
prefetched data is displaced since this costs no bus accesses to do
and only one bus access to replace the data if needed again later. A
Least Recently Used (LRU) algorithm is used to alternate between
multiple cache sets if all contained prefetched data, to prevent
If dirty data
existed in a cache set, it is displaced in
preference to any other activity. This is contradictory to the
desire to conserve bus bandwidth because two bus accesses are
required to clear dirty data, but necessary to maintain coherency.
If more than one dirty cache set is allowed to exist at one time,
then the order of the memory updates cannot cannot be preserved to be
identical to the accesses to the IO controller unless a complicated
scheme to track initial update order is implemented.
implementation of the scheme described above is
given below, using an example with two cache sets, showing the action
taken and the justification. (Note that prefetched and dirty are
exclusive states, so only possible cases are shown.)