Browse Prior Art Database

Efficient Marker-Based Probe Implementation Disclosure Number: IPCOM000177988D
Original Publication Date: 2009-Jan-12
Included in the Prior Art Database: 2009-Jan-12
Document File: 3 page(s) / 36K

Publishing Venue



The variable-length instructions used by the Intel/AMD x86 processor family pose challenges for insertion and removal of marker-based probes (which are sequences of instructions placed in a running program for analysis or debugging purposes). This disclosure describes an efficient algorithm for inserting and removing such probes.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 46% of the total text.

Page 1 of 3

Efficient Marker-Based Probe Implementation


Marker-based probe points normally rely on substitution of fixed-length instructions or on use of single-byte breakpoint instructions (e.g., int 3 on Intel™ and AMD™ x86 variable-length-instruction systems). It is desirable to use branch instructions instead of expensive state-change instructions like int 3, but there are substantial obstacles to use of branch instructions for multithreaded software (such as pthreads user-space code or inside the OS kernel) in variable-length-instruction systems:

1. Because a branch instruction contains an address or offset, it might overlay multiple instructions. Other threads might: (1) be preempted at one of the overlaid instructions for an arbitrarily long period of time, (2) have taken a signal or interrupt with return address pointing to one of the overlaid instructions, or (3) have executed one of the overlaid instructions that is a short-form function call so that the return address again points to one of the overlaid instructions. Such a condition might persist for an arbitrarily long time, and in cases (2) and (3) might be extremely difficult to identify, particularly in code compiled without frame pointers.

2. Because branch instructions occupy multiple bytes, some other thread might attempt to execute the code as it was being overlaid, resulting in execution of arbitrary garbage instructions.

3. The instructions being overlaid might span an instruction-cacheline boundary, so that a given CPU might retain parts of the old and new instruction sequences in its cache, again resulting in execution of arbitrary garbage instructions.

There is therefore a need to safely substitute low-overhead long-form jump instructions for sequences of code, as will be described in the following sections.

Algorithm for Placing Probes at Function Entry

The approach is to use (abuse) the profiling-hook feature present in some compilers. Such compilers permit insertion of arbitrary upon entry to each function. The inserted code might enter and leave some sort of RCU read-side critical section as described under Prior Art, or might use the approach described in the next section.


Page 2 of 3

Algorithm for Placing Probes on Markers

The following figure depicts marker insertion for this method. The marker consists of a

jump instruction with an address field large enough to reach the corresponding patc

buffer, but that simply branches to the next instruction. In this case, the jump instruction is assumed to have a single-byte opcode and a four-byte address field, though other layouts could work as well. Such a branch instruction is effectively a no-op, and has suitably low overhead.

First, replace the first byte of the jump instruction with a single-byte breakpoint/trap instruction (denoted by BT), as shown below:

Note that new executions will ta...