Browse Prior Art Database

Means for Computing the Max of a Set of Variables Distributed Across Many Processors

IP.com Disclosure Number: IPCOM000101783D
Original Publication Date: 1990-Sep-01
Included in the Prior Art Database: 2005-Mar-16
Document File: 5 page(s) / 207K

Publishing Venue

IBM

Related People

Stone, HS: AUTHOR

Abstract

This article describes a way to modify conventional bus implementation schemes that permits software to compute the MAX of variables held across many processors in an average time proportional to log N instead of N. When N processors that share a common bus must cooperate to find the maximum value of individually held local variables, the operation turns out to be a costly one in most multiprocessor systems. To a limited extent the problem is reduced when the processors have local cache memories to reduce bus traffic and use a cache-coherence protocol to maintain consistency of cache data as proposed by Goodman (1). Nevertheless, the computation of the MAX of a set of variables held across many processors requires at least O(N) bus transactions for his scheme.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 30% of the total text.

Means for Computing the Max of a Set of Variables Distributed Across Many Processors

       This article describes a way to modify conventional bus
implementation schemes that permits software to compute the MAX of
variables held across many processors in an average time proportional
to log N instead of N.  When N processors that share a common bus
must cooperate to find the maximum value of individually held local
variables, the operation turns out to be a costly one in most
multiprocessor systems. To a limited extent the problem is reduced
when the processors have local cache memories to reduce bus traffic
and use a cache-coherence protocol to maintain consistency of cache
data as proposed by Goodman (1).  Nevertheless, the computation of
the MAX of a set of variables held across many processors requires at
least O(N) bus transactions for his scheme.  Sweazy and Smith (2)
describe what they believe to be all reasonable variations of the
basic Goodman protocol.  None of these variations produce a protocol
that requires less than O(N) bus transactions for the computation of
MAX.

      The method described here reduces the time to O(log N) average
time from O(N) and is not costly to implement.

      Prior art that is related to this invention is a priority
resolution scheme that is built into bus interfaces such as for the
IEEE 696 bus (3).  This priority resolution scheme allocates a fixed
number of wires on a bus to the function of computing the maximum
priority among the requesters of the bus.  It is useful for finding
the maximum of a set of numbers that are available at one clock
cycle, but is not directly useful for finding the maximum of a set
that appear over a period of time, because it lacks the ability to
store and update a tentative maximum over the time period, and the
ability to detect when the period is over.

      Description of the New Method The method relies on a small
modification of a bus arbiter. A typical conventional scheme for bus
arbitration is shown in Fig. 1.  In this figure, N processors request
access to the bus, and one is granted access by an arbiter. The other
processors rerequest access on the next available cycle.  Our purpose
is to modify standard bus protocols to support the MAX operation for
user programs. The standard protocols may have a MAX operation at the
electrical interface for priority resolution as per the standard (4),
but that MAX operation does not directly solve the user's needs nor
is it accessible to the user.  Here we show that a simple change to
the arbiter makes available to the user the MAX operation in the form
that is required by application programs.  The reason that the
standard bus protocols cannot produce fewer than O(N) transactions to
find the MAX is that each processor must make a request to obtain the
value of MAX.  The first time that a processor makes such a request,
MAX is not held in its cache, so the processor must obtain the
current value o...