Browse Prior Art Database

Avoiding unnecessary wakeups when waking up CPUs at the completion of a grace period Disclosure Number: IPCOM000216206D
Publication Date: 2012-Mar-25
Document File: 3 page(s) / 35K

Publishing Venue

The Prior Art Database


Currently RCU has limited information on the time-criticality of a given RCU callback, so, with the exception of callbacks registered by kfree_rcu(), RCU takes a conservative approach, invoking callbacks aggressively. Because most callbacks are not time-critical, this aggressive approach wastes energy. Therefore, there is a need to inform RCU of the time-criticality of individual callbacks to promote green computing. One such approach is described below.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 46% of the total text.

Page 01 of 3

Avoiding unnecessary wakeups when waking up CPUs at the completion of a grace period

The 3.3-rc1 Linux* kernel implementation RCU conserves energy on servers with many CPUs that are running light workloads when the CONFIG_RCU_FAST_NO_HZ kernel configuration parameter is selected. That implementation conserves energy by allowing CPUs to enter dyntick-idle mode even though they have RCU callbacks pending. However, in many cases, there is no urgency in processing the callbacks, which in many cases would simply free memory -- and if there is plenty of free memory, then deferring the freeing further might allow such freeing to happen later when that CPU needed to wake up for some other reason. Such deferring would allow the CPU to remain idle longer, thus allowing it to reach lower power states.

However, it is not safe to do such deferring unconditionally, because some RCU callbacks are time critical, such that deferring them can decrease performance below tolerable levels (which might in turn indirectly impact energy efficiency) -- or even result in system hangs.

What is needed is some way to distinguish them among the RCU callbacks, allowing the end-of-grace-period wakeup to be deferred when all of the callbacks on the CPU in question are non-time-critical.

There are several aspects of this invention: (1) deduce time-criticality based on the source of the callback and (2) introduce an explicit API that provides time-criticality hints.

Deducing time-criticality based on the source of the callback

Callbacks are posted via the call_rcu(), synchronize_rcu(), rcu_barrier(), and kfree_rcu() API members, along with their counterparts for other flavors of RCU. The idea here is to maintain a per-CPU/per-RCU-flavor count of the number of time-critical callbacks (or equivalently, a count of the non-time-critical callbacks). Then when a given grace period ends, only those CPUs having at least one time-critical callback are sent IPIs. The time-critical callbacks are those posted by synchronize_rcu() and rcu_barrier(), while callbacks posted by kfree_rcu() are known to be non-time-critical, at least assuming that ample free memory is available. Callbacks posted by call_rcu() might or might not be time-critical, so this portion of the invention conservatively assumes that they are time-critical.

The simplest implementation makes kfree_rcu() increment a new field (named for example ->qlen_nonurgent) in the rcu_data structure as follows:

static void
__call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu),

struct rcu_state *rsp) {

unsigned long flags;

struct rcu_data *rdp;


head->func = func;

head->next = NULL;

smp_mb(); /* Ensure RCU update seen before callback registry. */


* Opportunistically note grace-period endings and beginnings.

* Note that we might see a beginning right after we see an

* end, but never vice versa, since this CPU has to pass through

* a quiescent state betweentimes.


Page 02 of 3