Browse Prior Art Database

Efficient power management in idle nodes Disclosure Number: IPCOM000237207D
Publication Date: 2014-Jun-08
Document File: 3 page(s) / 97K

Publishing Venue

The Prior Art Database


A method for efficient power management of idle nodes in a cluster is disclosed.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 27% of the total text.

Page 01 of 3

Efficient power management in idle nodes

Disclosed is a method for efficient power management of idle nodes in a cluster.

Power and energy consumption are significant issues in the area of High Performance Computing (HPC) where customers purchase very large cluster systems with thousands of compute nodes. The problem is only becoming worse with time as the user community is very interested in solving larger and larger problems and hence the average size of a compute cluster is increasing with time. Compute clusters are rarely kept 100% utilized. It is typical for a compute cluster to be utilized for about 80% to 90% of the time. Since it is a common practice to use a job scheduler/resource manager with compute clusters, innovative ideas have been proposed and implemented to do power and energy management through job schedulers. For example, when a job scheduler does not see enough jobs in the job queue to keep a certain number of compute nodes busy, it can choose to shut them down to save on power and energy consumption. However, this practice is fraught with a problem. Powering down a node has a severe drawback which is the time it takes to shutdown and reboot a node. While suspending to S3 state takes about 30 seconds and resuming from S3 to a normal state takes less than a minute. A second approach is to manage power consumption via the Operating System. Modern day microprocessors support multiple idle states, called C

states. In the current generation INTEL XEON PHI

                                ® Sandy Bridge processor, 6 levels of C states are supported. For example for a 2 socket node, power consumption in C3 is about 120W while in C6 it is about 80W. Putting a node in C6 is preferable to shutting it down as the latency to bring the node back to a normal state takes only a few seconds. On the other hand, even in C6 state, there is still considerable amount of power consumption. The current disclosed method leverages a "deep sleep" standby state called S3. A 2-socket node in S3 state consumes only about 20W as opposed to 80W in C6 state. Suspending a node to S3 state and bringing it back takes less than 2 minutes. But suspending and resuming nodes from S3 has still a limitation. In S3 state, the memory core power rail is left on to preserve the contents of memory and there is a maximum to the number of power cycles that the hardware components are qualified for. For this reason, it is not allowed for OS to suspend and resume nodes from S3 as the number of times this transition happens could easily exceed the maximum number of cycles, the hardware had been qualified for. If the number of power cycles exceeds the qualified limit, it could lead to hardware component failure thus affecting the overall hardware reliability. The decision as to when to transition a specific node in a cluster to S3 state has to be implemented by a mechanism that is external to OS. The mechanism also needs to take into account if the maximum qualified power cycles limit for a n...