Browse Prior Art Database

Programming Language Extensions for Power-Aware Computing on Multi-cores

IP.com Disclosure Number: IPCOM000190829D
Original Publication Date: 2009-Dec-10
Included in the Prior Art Database: 2009-Dec-10

Publishing Venue

IBM

Abstract

The Problem: Multi-core processors are the most viable means to delivering sustainable performance at least in the following 10 years. One of the primary reasons for driving industry towards multi-cores is because of the power wall – the increasing of performance of single processor is being limited by power consumption. Furthermore, the cost of maintain servers, e.g. data centres, is going up as a result of the increased power capacity and cooling expenses. IDC estimates that for every $1.00 spent on new hardware, an additional $0.50 is spent on power and cooling. This is more than double the amount spent five years ago [1]. As a result, state-of-the-art multi-core processors (such as Intel Nehalem EP architecture and AMD Phenom) provide per-core level dynamic voltage frequency scaling (DVFS) technique to leverage between the performance and the power consumption. This technology automatically allows processor cores to run faster than the base operating frequency by reducing the frequency of inactive cores (or shutting them down). Better performance of applications can be achieved by running them with higher frequency, or lower power consumption will be consumed by running them with lower frequency. This disclosure proposes two language annotations with which the programmer/user is able to identify which parts of the program are performance-bound or IO-bound (for instance, memory bound, or device IO bound) that amortize the benefits of DVFS. The language extension is simple and straightforward while allowing the compiler and the operating system to identify which parts of the application are performance (or IO) bound. When an application is executing performance bound sections, the OS can steal power to from other cores (that are running applications with non-performance sections) to increase the frequency of current core. In this case, the overall power consumption does not change while we are able to speed up the applications that running with performance-bound sections. There are also other scenarios that the performance of the applications is dependent on IO-communication. It is obviously that in this case, there is no point to run the application with high frequency because this will result in high power consumption without any performance gain. In reality, programs may have both performance-bound and IO-bound sections and we need an efficient way to allow the programmer to explicitly express this. According to researches [2], migrating non-performance-critical tasks to low-frequency processors can greatly reduce energy. For example, Qiong Cai et al [2] show that PageRank, which represents an important category of emerging applications such as web search engine, migrating non-critical tasks to lower frequency can achieve more than 40% energy savings without any performance loss on an eight-core system. If we can reduce the frequency of the processor when an application is executing IO-bound section, we can further reduce overall power consumption. As we mentioned above, energy cost has became the primary expense in data centres. Therefore, there is great business value if we can apply a simple scheme which leads to energy saving. State-of-the-arts: One naive known solution is to put (fast) threads that are waiting for some events to sleep as soon as they arrive to the barrier and then shut down the core. This won’t be a feasible approach unless the energy saved in sleep mode pays off the energy/performance wasted by putting the cores to sleep and waking them up later on. Modern operating systems allow users to specify the priority of a thread. However, this kind of priority schemes could only be applied on a whole-thread-level instead of arbitrary parts of an application. This solution is also a runtime and platform dependent approach. Hence, it is not portable. Qiong Cai et al [2] proposed to use profiling information to identify critical and non-critical threads and adjusting hardware component to accelerate critical threads. However, their approach could only apply to loop-based parallel applications and certain kind of scheduling policies (the static scheduling policy in OpenMP). Nonetheless, the profiling brings extract overhead to the execution of an application. Moreover, the adjusting of hardware component makes it infeasible to existing commodity processors. In contrast to existing solutions, we believe that the programmer can interactive with the compiler and the runtime system in a better way. He/She can use domain knowledge to acknowledge the compiler and the OS about the behaviour of the application. Summary To sum up, this disclosure aims to provide a way for the programmer to exchange knowledge to the runtime system. With this knowledge, we are able to take advantage of the latest asymmetric DVFS provided by multi-cores to achieve power-aware computing. As we mentioned above, power consumption is becoming increasingly important for most data centres. Unlike previous solutions, we propose a simple language extension that allows the programmer/user to explicitly identify which parts of the application are performance/IO-bounds. So, the OS can leverage between constrains of performance and energy consumption.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 24% of the total text.

Page 1 of 14

Programming Language Extensions for Power -

-Aware Computing on Multi

Aware Computing on Multi -

-cores

cores

We propose a simple and straightforward high level language extension to allow the programmer to express which parts of a program are bounded by CPU or IO performance. Here we use C programming language to describe our solution to aid clarity. Our solution is able to apply on all programming languages (suchas java, c++, python and so on) and similar approaches or expressions should be covered by this disclosure.

The programmer could use annotation to mark CPU/IO-bound sections of the program in the source code level (as illustrated in figure 1 and figure 2). Thenthe compiler transforms it into an intermediate representation as figure 2. How to represent and implementthese kinds of annotations is out of the scope of this disclosure.

void transcation

_1()

{

computation

1();

__enter

_cpu

__bound

_section()

;

load

_database

_entry();

sort(); store();

__exit

_cpu

_bound

_section()

;

release(); maintaince

_processes();

}

Figure 3 Intermediate representation of the performance bound section

void transcation

_1()

{

computation

_1();

__enter

_IO

_bound

_section()

;

load

_database

_entry();

sort(); store();

__exit

_IO

_bound

_section()

;

release(); maintaince

_processes();

}

Figure 4 Intermediate representation of the IO bound section

The compiler back-end - an OS and platform dependent end will then translate the performance critical section to library or system calls or can simply ignore it

1

Page 2 of 14

(if the targeting OS or the platform does not support DVFS).

It is up to the OS to decide how to schedule processes or threads that are running performance critical sections and this is orthogonal to this solution.We give two scenarios below to demonstrate how our scheme couldbe used to accelerate performance of critical sections and reducing power consumption on multi-core processors that support heterogeneous frequencies. Again, the heterogeneity of frequency means a multi-core processor could use dynamic voltage scaling to boost the performance of active cores by reducing the power of inactive cores. The processor could also movethreads that are running no-critical sections to cores with low voltage or frequency to save energy. This is due to the quadratic coefficient voltage plays in thepower equation (1/2 CV^2f).

Figure 5 is a hypothetical example for illustratinghow the OS could use the annotated sections to boost performance and to reduce power consumption. At time t0 , there are no applications executing critical sections, so the operating system could run all applications with low (or based) frequency leading to low power consumption. This is up to the OS to decide whetherto run applications with a low frequency. In some domains, such as embedded applications, it is common to run applications on proce...