Browse Prior Art Database

Dynamic Parallel Memories

IP.com Disclosure Number: IPCOM000128217D
Original Publication Date: 1983-Dec-31
Included in the Prior Art Database: 2005-Sep-15
Document File: 8 page(s) / 29K

Publishing Venue

Software Patent Institute

Related People

Uzi Vishkin: AUTHOR [+4]

Abstract

The current state of technology implies that memories which include many cells must be partitioned into a number of modules each containing many cells; where, only one cell (or a small number of cells) of each module can be accessed at a time. For more on this, see Kuck (Ku-77j and Gottlieb et a1. [GGKMRS-83]. On the other hand, many published parallel algorithms are designed for abstract shared memory models of parallel computation, where the processors have free access to each cell of the shared memory for both read and write purposes. An obvious difficulty arises when one wants to simulate these algorithms on builc(able machines. One approach is to require from designers of algorithms for abstract shared memory models of parallel computation to limit, as much as possible, the size of the shared memory that the algorithm must use. This is usually ,done in favor of more local computations in which each processor accesses its own local memory only. Kuck mentions several papers that practiced this ad-hoc approach. Even in cases where such a limitation is possible this approach puts some additional burden on the designer which is not desirable. -- Let us be a bit more precise. Given a shared memory model of parallel computation D we define M(D) to be the model of computation which is derived from D by partitioning the shared memory of D into modules so that no more than one cell of each module can be accessed at a time. If there are several simultaneous requests for the same common memory location in M(D) they are treated in the same way as in D. If '``To appear in information and Control there are several simultaneous requests for different cells of the same module, they are queued and responded one at a time. The granularity problem is defined as the problem of simulating a cycle of D by M(D). Automatic solutions for the general case where we do not know anything about the cycle to be simulated are discussed in Mehlhorn and Vishkin [MV-82]. They suggest a multi-stage approach for attacking the granularity problem. We mention the two main stages. The first stage designed to keep us 'out of trouble', in the average case, utilizes universal hashing in the simulating machine M(D). M(D) itself picks at random a hashfunction from an entire class of hashfunctions before each simulation of an algorithm, instead of a specific hashfunction. This is shown to keep memory contention low. The idea behind the second stage is to keep several copies of each memory address in distinct memory modules. This idea, in conjunction with fast algorithms for picking the 'night' copy of each requested address is shown to decrease memory contention for the worst case results of the first stage. The main result of the present paper is that in a few general cases the idea of dynamically changing locations of addresses among modules throughout the performance of an algorithm provides a solution for the granularity problem in constant time utilizing only as many modules as the number of processors.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 15% of the total text.

Page 1 of 8

THIS DOCUMENT IS AN APPROXIMATE REPRESENTATION OF THE ORIGINAL.

Dynamic Parallel Memories*

Ultracomputer Note #54 (April 83. Revised: June 83) 0'0i%4PUTER' . y f .5.,

Uzi Vishkin Courant Institute New York University Avi Wigderson Department of Electrical Engineering and Computer Science University of California at Berkeley

1. Introduction

The current state of technology implies that memories which include many cells must be partitioned into a number of modules each containing many cells; where, only one cell (or a small number of cells) of each module can be accessed at a time. For more on this, see Kuck (Ku-77j and Gottlieb et a1. [GGKMRS-83]. On the other hand, many published parallel algorithms are designed for abstract shared memory models of parallel computation, where the processors have free access to each cell of the shared memory for both read and write purposes. An obvious difficulty arises when one wants to simulate these algorithms on builc(able machines. One approach is to require from designers of algorithms for abstract shared memory models of parallel computation to limit, as much as possible, the size of the shared memory that the algorithm must use. This is usually ,done in favor of more local computations in which each processor accesses its own local memory only. Kuck mentions several papers that practiced this ad-hoc approach. Even in cases where such a limitation is possible this approach puts some additional burden on the designer which is not desirable. -- Let us be a bit more precise. Given a shared memory model of parallel computation D we define M(D) to be the model of computation which is derived from D by partitioning the shared memory of D into modules so that no more than one cell of each module can be accessed at a time. If there are several simultaneous requests for the same common memory location in M(D) they are treated in the same way as in D. If '``To appear in information and Control there are several simultaneous requests for different cells of the same module, they are queued and responded one at a time.

The granularity problem is defined as the problem of simulating a cycle of D by M(D). Automatic solutions for the general case where we do not know anything about the cycle to be simulated are discussed in Mehlhorn and Vishkin [MV-82]. They suggest a multi-stage approach for attacking the granularity problem. We mention the two main stages. The first stage designed to keep us 'out of trouble', in the average case, utilizes universal hashing in the simulating machine M(D). M(D) itself picks at random a hashfunction from an entire class of hashfunctions before each simulation of an algorithm, instead of a specific hashfunction. This is shown to keep memory contention low. The idea behind the second stage is to keep several copies of each memory address in distinct memory modules. This idea, in conjunction with fast algorithms for picking the 'night' copy of each requested address is shown to...