Browse Prior Art Database

A Cache Technique for Synchronization Variables in Highly Parallel, Shared Memory Systems

IP.com Disclosure Number: IPCOM000128199D
Original Publication Date: 1988-Dec-31
Included in the Prior Art Database: 2005-Sep-15
Document File: 16 page(s) / 57K

Publishing Venue

Software Patent Institute

Related People

Wayne Berke: AUTHOR [+3]

Abstract

Caches have traditionally been used to lower the average latency of memory access. When paired with the individual CPUs of a multiprocessor, they have the additional benefit of reducing the overall load on the processor-memory interconnection. Since synchronization variables have been identified as centers of memory conten-tion, we have looked at methods of utilizing the cache to minimize this effect. A technique of "polling the cache" is proposed to deal with this problem. Consistency within the caches is maintained with a bottleneck-free update facility that exploits the topology of the multistage network. Since an indiscriminate broadcast-on-write policy can lead to severe network congestion at high levels of parallelism, we selec-tively invoke these updates from the software. We illustrate our methods with a number of useful synchronization algorithms and present simulation results that sup-port the feasibility of our design. In addition to providing support for basic syn-chronization operations, our methodology is generally applicable to all parallel algo-rithms that utilize polling.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 6% of the total text.

Page 1 of 16

THIS DOCUMENT IS AN APPROXIMATE REPRESENTATION OF THE ORIGINAL.

A Cache Technique for Synchronization Variables in Highly Parallel, Shared Memory Systems

by Wayne Berke

Ultracomputer Note #151 December, 1988 This work was supported is part by the Applied Mathematical Sciences Program of the US Department of Energy under contract DE-AC02- 76ER03077 and is part by the National Science Foundation under grants DQt-8413359, and in part by LB.M. under joint study agreement N00039-84-R-0605(Q).

ABSTRACT

Caches have traditionally been used to lower the average latency of memory access. When paired with the individual CPUs of a multiprocessor, they have the additional benefit of reducing the overall load on the processor-memory interconnection. Since synchronization variables have been identified as centers of memory conten-tion, we have looked at methods of utilizing the cache to minimize this effect. A technique of "polling the cache" is proposed to deal with this problem. Consistency within the caches is maintained with a bottleneck-free update facility that exploits the topology of the multistage network. Since an indiscriminate broadcast-on-write policy can lead to severe network congestion at high levels of parallelism, we selec-tively invoke these updates from the software. We illustrate our methods with a number of useful synchronization algorithms and present simulation results that sup-port the feasibility of our design. In addition to providing support for basic syn-chronization operations, our methodology is generally applicable to all parallel algo-rithms that utilize polling.

1. Introduction

The shared memory MIMD computer is becoming the vehicle of choice for many parallel pro- grammers due to the simplicity of the programming model when compared to that of distributed memory machines such as hypercubes and multi-dimensional meshes. Although the shared memory paradigm can be imposed on these latter machines,' the dichotomy between software
.model and underlying architecture will limit peak performance. A dominant characteristic of any shared memory machine is the manner in which PEs are con-nected with memories. Those with a small to moderate number of PEs can work well with either a crossbar [Wulie72j or a simple bus [ThGFB$] [Hi1186]. However, in considering a highly parallel machine (with hundreds of PEs, for example), busses become too easily saturated and crossbars become too expensive. In constructing an N-PE multiprocessor where N is large, a multistage net-work [GoLi73] [Pate81] [Gott87] can offer a good compromise between hardware cost (0 (N log N)) and bandwidth (0 (N))? Unfortunately, multistage networks are also subject to saturation especially while supporting programs that display non-uniformity in their access patterns [PfNo85j. When such hot spot behavior occurs, the memory contention becomes so severe that it affects the latency of requests to all of memory. Pfister and Norton identify synchronization varia...