Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Technique to Allocate Main Memory Frames to Improve Cache Performance Across Sequential Reference Patterns

IP.com Disclosure Number: IPCOM000101682D
Original Publication Date: 1990-Aug-01
Included in the Prior Art Database: 2005-Mar-16
Document File: 4 page(s) / 146K

Publishing Venue

IBM

Related People

Bowen, NS: AUTHOR

Abstract

A rapid reference to a large amount of data is a reference pattern that is found in many workloads. Furthermore, if the data is only referenced a single time, its presence in the processor cache is not beneficial. Reference patterns in this category include: 1. Moving data among buffers. 2. Scanning a buffer to perform tasks such as parsing. 3. Scanning through a large array of data. For example, a FORTRAN DO LOOP on a large matrix.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 50% of the total text.

Technique to Allocate Main Memory Frames to Improve Cache Performance Across Sequential Reference Patterns

       A rapid reference to a large amount of data is a
reference pattern that is found in many workloads.  Furthermore, if
the data is only referenced a single time, its presence in the
processor cache is not beneficial.  Reference patterns in this
category include:
1.  Moving data among buffers.
2.  Scanning a buffer to perform tasks such as parsing.
3.  Scanning through a large array of data.  For example, a
    FORTRAN DO LOOP on a large matrix.

      As a process references this data, it is loaded into the
processor cache.  In the case that the data is only referenced a
single time, the net effect is that the process's working set in the
cache has been purged by the data.  The process then goes through a
"start-up" transient where there are many cache faults to reload the
cache.  This can have a negative impact on the overall system
performance.

      In this disclosure we propose a technique to allocate the real
frames so that the cache damage because of the buffer reference is
minimized.  We begin by first understanding this effect in greater
detail.  The examples in this disclosure are taken from the IBM
System/370 architecture with a 64K 4-way set associative
implementation of the cache, but are applicable to any system which
has a similar relationship between the cache size and real frame size
as illustrated in this disclosure.  In Fig. 1 we observe the
relationship of the 31 bits in the real storage address to the cache
address.  The low order 7 bits are the byte offset within the cache
line and the next 7 bits are the set number within the cache.
Figure 1.  Real storage address relation to cache addresses

      The format of a 64K 4-way set associative cache is shown in
Fig.  2.  There are 128 sets, each of which contain 4 128-byte
entries.  To locate an entry in the cache one uses the set number
from the real storage address to index into the cache.  Then one of
the four entries within the set are selected.  If an entry must be
replaced, then a typical algorithm, such as LRU (Least Recently
Used), is used.
Figure 2.  Format of 64K 4-way set associative cache We next look at
the effect of sequentially referencing every byte (at least one byte
from every 128 byte line) of a single 4K page frame.  In Fig. 3 there
is a 31-bit real storage address where the high order 19 bits are the
frame number and the low order 12 bits are the offset within the
frame.
Figure 3.  Real Storage Address in Relation to Real Frame Address

      Note that since the set number and byte in line (see Fig. 1)
are the low 14 bits of the cache address and this overlaps the low
order 2 bits of the frame number (see Fig. 3) all lines generated
from a single real storage frame have a common two bits in the set
number.  This means that for given frame, the cache lines generated
for that frame all go to the sa...