Browse Prior Art Database

Implementing locks in a shared-memory multiprocessor using a simplified coherence protocol

IP.com Disclosure Number: IPCOM000200052D
Publication Date: 2010-Sep-24
Document File: 5 page(s) / 32K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is an alternative lock implementation in multiprocessor computer architecture that avoids inefficiencies in current software-based lock implementations - busy loop to evaluate the availability of the lock, inefficient coherence protocols, and, false sharing of locks within a cacheline. On a lock read, an "Exclusive" copy of the cacheline is provided to the requester, rather than the traditional "Shared" copy.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 21% of the total text.

Page 01 of 5

Implementing locks in a shared-memory multiprocessor using a simplified coherence protocol

Synchronization Primitives

A crucial aspect of the performance of shared-memory multiprocessor systems is the efficiency of the synchronization primitives provided by the system. Synchronization primitives provide the fundamental abstractions upon which synchronization routines are built by system software or library routines. These are in turn used by application software. Synchronization routines are necessary for multiple threads of execution to coordinate mutually exclusive access to shared resources (critical sections using locks), and to coordinate the order of operations between different threads (flags and barriers). Fundamentally, there are three types of synchronization operations required by shared memory multi-threaded programs:

1. Locks for mutual exclusion between threads
2. Flags for point-to-point ordering between threads
3. Barriers for coordinating progress across multiple threads

Of these three, locks are the most interesting from an architectural implementation point of view because they need special support in hardware to allow an atomic read and write of a variable. Flags can be implemented without locks by using only the write atomicity guarantees provided by the coherence protocol, and barriers can be implemented by using locks.

Current Implementations

Many architectures provide special instructions in the Instruction Set Architecture (ISA) to enable library routines or system software routines to be developed for implementing locks. There are two common implementations:
1. Atomic instructions: instructions such as test-and-set, compare-and-swap, fetch-and-increment etc,
2. Instruction pair: instructions such as load

_link/store

conditional pair (LL/SC pair).

It is generally accepted that the LL/SC pair is more efficient than any atomic instruction; The LL/SC pair improves performance over atomic instructions by avoiding the large number of invalidations that could happen at each lock release; only the first
store

_conditional from contending threads is allowed to succeed. If the coherence

controllers in the corresponding processors observe the invalidation resulting from the first store

_conditional, the other store-conditionals (from the other contending threads)

will not even be sent over the interconnect.

Drawbacks of current lock implementations

Drawback 1: Busy looping

Even though LL/SC is generally a better solution than atomic instructions, it does not

_

1


Page 02 of 5

avoid a busy loop in the routine implementing the lock function. Busy looping results in the waste of processor resources and may prevent other processes from being scheduled. When software tries to acquire a lock, it executes code to LOCK a lock variable similar to what is shown in Figure 1 in pseudo-assembler. Figure 2 shows the corresponding code to UNLOCK a lock variable. The main observation is that even though the LL/SC pair reduces traffic...