Browse Prior Art Database

An Open MP Extension for Architecture Aware Thread Scheduling

IP.com Disclosure Number: IPCOM000146571D
Publication Date: 2007-Feb-16
Document File: 3 page(s) / 171K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a method that extends the current Open MP parallel programming model to efficiently map a parallel application to the future multi-many core architecture. Through the "group" Open MP primitive, the compiler or OS schedules the associated threads to the closely-coupled cores/processors. Benefits include reducing cache coherency traffic.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 52% of the total text.

An Open MP Extension for Architecture Aware Thread Scheduling

Disclosed is a method that extends the current Open MP parallel programming model to efficiently map a parallel application to the future multi-many core architecture. Through the “group” Open MP primitive, the compiler or OS schedules the associated threads to the closely-coupled cores/processors. Benefits include reducing cache coherency traffic.

Background

With the proliferation of multi-core/many-core architectures, the gap between parallel programming languages (e.g. Open MP), and the underlying hardware architecture becomes more prominent. This is particularly true for mapping and scheduling spawned threads to the ideal processors. Traditionally, this is the job of OS, which uses some specific scheduling techniques to improve the cache locality and load balancing. However, current schedulers in the OS (including Windows and Linux) do not fully realize the architecture features of the target machine; these features include sharing a Last Level Cache (LLC) with a dedicated cache hierarchy, and several LLCs interconnected with ring/crossbar (a popular cache hierarchy design in current SMP machines). Therefore, it is crucial to extend the current scheduling algorithm to minimize the data transfer among different LLCs, and reduce the cache coherency traffic for parallel application performance.

General Description

The disclosed method uses an Open MP extension with a new primitive “group”, to present the affinities of the working threads. For example, in the hierarchical producer/consumer model, a master and its associated worker threads share a same queue. Generally, parallel application developers understand the detailed algorithm and the executing pattern among the threads. Therefore, using the new “group” primitive, they can easily group the closely-coupled threads together to reflect their affinity relationship. After identifying the Open MP “group”, the compiler or OS schedules the threads in a group to the dedicated hardware architecture accordingly, (e.g. they are more prone to be scheduled in one bank of the many-core architecture). The term “bank” is defined as the unit consisting of several cores/processors, which share an LLC in the bank.

The primitive “group” can be applied to different parallel patterns, as long as there are dependencies among the working threads. The following examples demonstrate the uses of the disclosed method:

 

§         Hierarchical producer/consumer parallel programming model.
Figure 1 shows a code segment commonly use...