Browse Prior Art Database

Cache Touch Stack

IP.com Disclosure Number: IPCOM000112596D
Original Publication Date: 1994-Jun-01
Included in the Prior Art Database: 2005-Mar-27
Document File: 2 page(s) / 111K

Publishing Venue

IBM

Related People

Van Fleet, JW: AUTHOR

Abstract

There are a number of points which can be made, both pro and con, regarding multiprocessor architectures which use only software protocols for cache coherency. There are strong reasons to examine multiprocessor without direct cache coherency maintenance in the hardware. The benefits of such a multiprocessor complex are many:

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 49% of the total text.

Cache Touch Stack

      There are a number of points which can be made, both pro and
con, regarding multiprocessor architectures which use only software
protocols for cache coherency.  There are strong reasons to examine
multiprocessor without direct cache coherency maintenance in the
hardware.  The benefits of such a multiprocessor complex are many:

o   less expensive and simpler memory subsystems

o   less impact on family based uniprocessor performance

o   the capability to have a greater number of processors share the
    same memory.

      On the other side, it is generally more difficult to provide
correct and efficient kernel services.  There is also the question of
the impact to the general user/process which is discussed below.
Performance gain on a multiprocessor comes in two forms: throughput,
where the number of work units which can be processed in a time unit
increases, and, speed-up, where a single entity (a single job)
decreases its own elapsed time.  Speed-up generally occurs with
changes in a user program to include means of parallelization and
synchronization.  In standard cache coherent machines, the hardware
provides the mechanisms which provide storage sharing at fine
granularities with good performance.  Without cache coherence
protocols in the hardware, fine granularity sharing at the
application program level would require compiler assistance and/or
user programmer controls to provide adequate performance.  User level
changes are generally not included in throughput considerations.  In
fact, for the general user/process, the programming model is
unchanged.  That is, there is no consideration made for
multiprocessing, nor is there any benefit gained by a single
process/user for the presence of multiple processors.  For single
process applications the programming paradigm has not changed as a
result of the presence of multiple processors.  Performance benefits
accrue in the throughput case by simply adding more work -- more user
level processes.  For throughput, then, the question becomes one of
how to provide performance efficient cache management assists for
kernel software.

      This article describes architecture for cache management
hardware which will assist the kernel in maintaining its storage
consistency while continuing to provide the major benefits of an
architecture which does not have cache coherency.  The problem to be
solved is to maintain program correctness and to achieve acceptable
levels of performance.  In this article there is no discussion of
kernel correctness.  This difficult problem is assumed to be solved.
In a Multiprocessing (MP) system, this means that the data is
correctly synchronized with locking structures of some sort.
Additionally, a correctness attribute present in cache coherent MP
systems is that of the size of an atomic unit.  In most
multi-processing architectures with hardware cache coherency, the
atomic unit is a machine byte or machine wor...