Browse Prior Art Database

Adaptive Algorithm to Improve the Performance of Message Passing Code Execution on Shared Memory Parallel Architectures

IP.com Disclosure Number: IPCOM000112242D
Original Publication Date: 1994-Apr-01
Included in the Prior Art Database: 2005-Mar-27
Document File: 4 page(s) / 122K

Publishing Venue

IBM

Related People

Mraz, R: AUTHOR

Abstract

An algorithm is disclosed that (implemented in microcode and/or hardware) allows shared memory machines to efficiently execute message passing code. The algorithm is adaptive and is particularly effective when the SPMD application code specified wildcard receives. Traditional implementations of a message passing library within a shared memory environment may not offer efficient SPMD (Single, Program, Multiple Data) performance when the system scales beyond several nodes.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Adaptive Algorithm to Improve the Performance of Message Passing
Code Execution on Shared Memory Parallel Architectures

      An algorithm is disclosed that (implemented in microcode and/or
hardware) allows shared memory machines to efficiently execute
message passing code.  The algorithm is adaptive and is particularly
effective when the SPMD application code specified wildcard receives.
Traditional implementations of a message passing library within a
shared memory environment may not offer efficient SPMD (Single,
Program, Multiple Data) performance when the system scales beyond
several nodes.

      The way of a parallel shared memory architecture supports
memory coherence can significantly impact the performance of a
programs execution.  Message passing algorithms can work to reduce
these coherence effects since data transfers are done in block
format.  In traditional implementations, a semaphore is provided to
protect a mailbox buffer for each node to receive data and associated
header information.  This creates contention, through the semaphore
access, since each mailbox is a multiple writer, single reader
problem.  Contention such as this creates overhead and delays in both
coherent and non-coherent shared memory machine implementations.

      The approach described herein is to remove this coherence
problem by having the buffer owned by each sender.  This transforms
the above problem to a single writer, multiple reader contention
problem.  These problems do not require atomic operations such as
semaphores for program correctness.  This straight-forward solution
presents another problem when application code specifies a wildcard
receive.  These wildcard receive cases require a given node to search
all sender's mailboxes to determine which message is for him.  In the
general case, this is of order O(N) where N is the number of
processors participating in the parallel execution since the
application can specify a wildcard receive that allows acceptance of
a message from any source.  The novel algorithm described here relies
on the application programmers node locality of reference to speed-up
mailbox searches.  The implication is that if an SPMD program node
has sent a message to me previously, he is likely to do so again.

      A classic approach to parallel programming that takes full
advantage of this algorithm would be to partition the parallel
application into a grid solver.  No matter how many processors
execute in parallel any node will most likely interact with these
designated as north, south, east and west of a given node.  A cache
or record of the latest referenced sending mailboxes are maintained
at a predefined depth for each node.  When a node is informed, that
messages are awaiting reception and a wildcard receive has been
issued, the receiver begins the search with the cached values of the
m...