Browse Prior Art Database

Method for switching a memory order model with a CPU instruction

IP.com Disclosure Number: IPCOM000128943D
Publication Date: 2005-Sep-21
Document File: 3 page(s) / 12K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a method for switching a memory order model (MOM) with a central processor unit (CPU) instruction. Benefits include improved functionality and improved performance.

This text was extracted from a Microsoft Word document.
This is the abbreviated version, containing approximately 38% of the total text.

Method for switching a memory order model with a CPU instruction

Disclosed is a method for switching a memory order model (MOM) with a central processor unit (CPU) instruction. Benefits include improved functionality and improved performance.

Background

              Memory instruction processing functions in accordance with a memory order model. In general, strong MOMs are conceptually easier, especially for programmers. However, strong models create hardware restrictions that may reduce performance unnecessarily. High-powered processors have complicated hardware structures and flows that improve performance, despite a strong MOM. Simpler hardware provides equivalent performance for a weak MOM. Multiple processors on a single silicon chip result in complexity during the implementation of the memory order model and in the cost of development, silicon surface area, and power.

              Write-back caches are more complex to design than write-through caches but are generally required for high level caches to cache ownership of blocks for hitting the store-stream. Cores, in contrast, have write-through low level and HIGH LEVEL cache levels, which eliminate the requirement for error correction code (ECC) in those caches and for a victim extraction flow for cache-line replacement or forwarded requests from other cores.

              If a store misses in a write-back cache or coalescing merge buffer, the store must launch a request to the lower-level cache hierarchy or memory. Stores must be visible in the correct order. If the underlying memory subsystem is unordered, stores must wait for prior store misses to resolve, resulting in a significant performance loss. Alternatively, stores must request cache-line ownership out-of-order and speculatively before the store has committed.

              Because multiple processors can bid for ownership of the same blocks, a global order must be enforced. More recent senior stores may have to relinquish ownership that was acquired before older senior stores have acquired ownership. Failure to enforce the ordering may result in a memory order model-induced deadlock. A processor implementing a weak memory order model can instead graduate senior stores out-of-order to different addresses without deadlocking or violating a total store order required by the memory order model.

              Microprocessor designs must adhere to the strong memory order model to maintain backward compatibility, although most programs do not require a strong memory order model due to being single threaded. However, very few applications would function improperly with a weak memory order model.

              Multithreaded programs are expected to become more prolific as multicore processors become widely implemented. Multithreaded programming paradigms increasingly use synchronization library routines and primitives when communicating between threads. The libraries vary, depending on the underlying memory order model. If the library executes on a p...