Browse Prior Art Database

Efficient and Correct Execution of Parallel Programs That Share Memory

IP.com Disclosure Number: IPCOM000128242D
Original Publication Date: 1986-Dec-31
Included in the Prior Art Database: 2005-Sep-15
Document File: 26 page(s) / 69K

Publishing Venue

Software Patent Institute

Related People

Dennis Shasha: AUTHOR [+4]

Abstract

In this paper, we consider an optimization problem that arises in the execution of parallel programs on shared memory multiple-instruction stream multiple-data stream (MIMD) computers. A program on such a machine consists of many program segments each executed sequentially by a single processor. The processors have access to shared memory, and can execute standard memory access operations on this shared memory. This memory is distributed among many separate memory modules. A network connects processors to memory modules. Delays on this network are stochastic. Thus, operations issued by a processor to distinct memory modules may not be executed as memory requests on those modules in the order they were issued.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 6% of the total text.

Page 1 of 26

THIS DOCUMENT IS AN APPROXIMATE REPRESENTATION OF THE ORIGINAL.

Efficient and Correct Execution of Parallel Programs That Share Memory

by Dennis Shasha Marc Snirf

Ultracomputer Note #96 Technical Report #206 March, 1986 t The work of the first author was partially supported by the National Science Foundation under grant number DCR8501611 and by the Office of Naval Research under grant number N00014-85-K-0046. The work of the second author was done while the author was in residence at the Courant: Institute.

Address for Marc~Snir: Institute of Mathematics and Computer Science, Hebrew University of Jerusalem. should take effect before p2, so that y=0. If the accesses to shared memory in either the first program segment or in the second program segment are executed out of order, then it is quite possible to obtain this inconsistent result (figure 1). Note, however, that this is the only impossible result. For example, the access pattern

(Equation Omitted)

would yield

(Equation Omitted)

. This would be entirely acceptable, since it produces the same results as the interleaving

(Equation Omitted)

This example has shown that it may be necessary for a processor to ensure that certain operations are executed in the order they appear in a program segment (even when there are no data dependencies within that segment). We assume that a processor may delay one memory access until acknowledgment of some previous memory request is received. This is the only mechanism -we allow: there is no inter-segment communication other than through shared variables. Conflicts may occur only when shared variables are modified. Thus, accesses to private variables (i.e. variables that occur in one: program segment only) cannot cause problems. The same is true for accesses to shared read-only variables. Private and read-only variables are easy to detect. The following simple policy guarantees correct execution of parallel code (see [E;GK]):

When an access to a shared memory location is issued, wait for acknowledgement if a shared read-write variable is accessed; otherwise issue the next operation immediately. We show in this paper that it is possible to improve on this simple policy by doing a more elaborate analysis of data dependencies. We give a characterization of the minimal set of delays that must be introduced in the code in order to enforce sequential consistency. We have addressed so far the problem of serializing the execution of atomic machine operations. This is the issue facing a machine designer. A high level language designer or a user face a different issue: for them the sequential consistency principle should apply to those operations in the high level language that are defined to be atomic. (This principle turns out to be far closer to the principle of serializability for database systems.) When high level language constructs are compiled into machine code, one high level atomic operation translates into possibly several machine operations. E...