Browse Prior Art Database

Improving Multi-Threaded Technology Based Processor Performance by Removing False Sharing and Enabling Global Store to Load Forwarding

IP.com Disclosure Number: IPCOM000021030D
Publication Date: 2003-Dec-17
Document File: 3 page(s) / 45K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a method for an enhanced architecture to remove the false sharing problem associated with Multi-Threaded (MT) enabled processors via hardware modifications. Benefits include improving performance.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 56% of the total text.

 

Improving Multi-Threaded Technology Based Processor Performance by Removing False Sharing and Enabling Global Store to Load Forwarding

Disclosed is a method for an enhanced architecture to remove the false sharing problem associated with Multi-Threaded (MT) enabled processors via hardware modifications. Benefits include improving performance.

Background

False sharing is a serious cause of performance drain in MT-enabled processors. It causes a machine clear event, which in turn causes pipeline flushes. It is caused by the lack of ‘global visibility’ of data dependencies between the data being accessed in the two logical processors. The existing procedure for avoiding false sharing is to modify the data structure placements in a program; however, this requires source level access and modification.

False sharing is caused by a combination of the following architectural features:

§         For MT processors, each logical processor has its own load/store buffers.

  • Each store updates the whole cache line at a time.
  • When two stores or a load/store (on separate threads) happen at the same time and point to different data objects on the same cache line, they need to be serialized for data consistency. (Note. Serialization of data accesses to the same data object is performed by explicit serialization at the software level).

General Description

The disclosed method avoids pipeline flushing and degraded performance by serializing the probable memory access instructions in different threads (see Figure 1). The disclosed method proposes two ideas:

 

  1. Using global data access knowledge to schedule and serialize memory access instructions, thus preventing false sharing and corresponding performance degradation.
  2. Enabling global data forwarding from the store buffer of one processor to the load instruction of the second processor.

In order to prevent false sharing, the disclosed method performs the following:

 

  • Maintains independent memory store and load buffer for each logical processor.
  • For each memory load or store instruction (micro-op), checks whether there is any conflicting memory operation on the fly, by checking the cache line accessed in the same logical processo...