The Prior Art Database and Publishing service will be updated on Sunday, February 25th, from 1-3pm ET. You may experience brief service interruptions during that time.
Browse Prior Art Database

Method for Concurrent Implementation of Reference Counts by Using Hardware Transactional Memory (HTM)

IP.com Disclosure Number: IPCOM000238297D
Publication Date: 2014-Aug-15

Publishing Venue

The IP.com Prior Art Database



This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 17% of the total text.

Page 01 of 10

Method for Concurrent Implementation of Reference Counts by Using Hardware Transactional Memory (HTM)

Reference counts are widely used in many multi-threaded software as a mechanism for identifying potential victim objects (e.g., page replacement or garbage collection). In this mechanism, each shared object has a reference count which is increased by one every time a thread intends to access the object as a means to express "interest" in the object (the link operation). Once a thread is finished with accessing the object, it decrements the reference count (the unlink operation). During the process of collection or replacement, an object is considered a victim candidate only if the reference count value becomes zero. Examples usage for reference counts include garbage collection in Java, shared pointers in C++ boost library, and data page caching in database management systems such as DB2.

In order to provide functional correctness, access to reference counts must be protected by a lock (commonly implemented by a mutex construct) which serializes the modifications and accesses to the reference count made by concurrently executing threads. A pseudo code for the three main operations on the reference count is described below.

link (object o, mutex m) {


     m.release(); }

unlink (object o, mutex m) {


     m.release(); }

collect (object o, mutex m) {

if (o.ref_count == 0) {


     m.release(); }

Note that the mutex m is passed as a parameter to allow for different locking granularity. In one extreme, every single object has its own mutex. This allows


Page 02 of 10

for maximum concurrency with the lock-based implementation but introduces a large (often unacceptable) overhead of managing a potentially huge number of locks. In the other extreme, all objects are protected by a single mutex. This alternative is much simpler to manage, but allows for little concurrency as simultaneous accesses to different objects are unnecessarily serialized.

Such a synchronization point can be a significant source for lack of scaling especially when a large number of threads attempt to modify the reference count. The figure below shows this effect by using a simple, multi-threaded micro-benchmark in which individual threads periodically perform the link and unlink operations on a single shared object for a fixed number of iterations. To obtain predictable results, each thread is bound to its own dedicated processor. The test is run on a 32-core server based on IBM POWER7 processor with 8 cores in each processor chip. As it can be seen from the chart, the execution time is dramatically increased when the number of threads (and processors) is beyond 8 cores.

This invention is a method based on Hardware Transactional Memory (HTM) to implement reference count in a way that multiple threads can access or update a shared object's reference count concurrently without alwaysrequiring strict...