Browse Prior Art Database

Method for Concurrent Implementation of Reference Counts by Using Hardware Transactional Memory (HTM) Disclosure Number: IPCOM000238297D
Publication Date: 2014-Aug-15

Publishing Venue

The Prior Art Database



This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 17% of the total text.

Page 01 of 10

Method for Concurrent Implementation of Reference Counts by Using Hardware Transactional Memory (HTM)

Reference counts are widely used in many multi-threaded software as a mechanism for identifying potential victim objects (e.g., page replacement or garbage collection). In this mechanism, each shared object has a reference count which is increased by one every time a thread intends to access the object as a means to express "interest" in the object (the link operation). Once a thread is finished with accessing the object, it decrements the reference count (the unlink operation). During the process of collection or replacement, an object is considered a victim candidate only if the reference count value becomes zero. Examples usage for reference counts include garbage collection in Java, shared pointers in C++ boost library, and data page caching in database management systems such as DB2.

In order to provide functional correctness, access to reference counts must be protected by a lock (commonly implemented by a mutex construct) which serializes the modifications and accesses to the reference count made by concurrently executing threads. A pseudo code for the three main operations on the reference count is described below.

link (object o, mutex m) {


     m.release(); }

unlink (object o, mutex m) {


     m.release(); }

collect (object o, mutex m) {

if (o.ref_count == 0) {


     m.release(); }

Note that the mutex m is passed as a parameter to allow for different locking granularity. In one extreme, every single object has its own mutex. This allows


Page 02 of 10

for maximum concurrency with the lock-based implementation but introduces a large (often unacceptable) overhead of managing a potentially huge number of locks. In the other extreme, all objects are protected by a single mutex. This alternative is much simpler to manage, but allows for little concurrency as simultaneous accesses to different objects are unnecessarily serialized.

Such a synchronization point can be a significant source for lack of scaling especially when a large number of threads attempt to modify the reference count. The figure below shows this effect by using a simple, multi-threaded micro-benchmark in which individual threads periodically perform the link and unlink operations on a single shared object for a fixed number of iterations. To obtain predictable results, each thread is bound to its own dedicated processor. The test is run on a 32-core server based on IBM POWER7 processor with 8 cores in each processor chip. As it can be seen from the chart, the execution time is dramatically increased when the number of threads (and processors) is beyond 8 cores.

This invention is a method based on Hardware Transactional Memory (HTM) to implement reference count in a way that multiple threads can access or update a shared object's reference count concurrently without alwaysrequiring strict...