Method for improving the accuracy and efficiency of exception based instruction tracing
Publication Date: 2014-Aug-26
The IP.com Prior Art Database
Instruction tracing can use exceptions in order to stop at every branch in the code, and gather relevant data to construct the program flow within the branch. An exception can occur at every branch if the branch related bit is enabled. The Step related bit can be used to stop at every step as well. In a tracing tool, the number of cycles required to complete a block of instructions, is computed based on the assumption that the number of cycles required to complete one instruction is always constant. Further, the rate at which trace exceptions occur, leads to operating system decrementer interrupts occurring between trace exceptions thus leading to a cluttered instruction flow for the main target program. The article outlines ways to improve the accuracy of computing the total number of cycles required to complete instruction blocks, as well as ensure that the instruction flow for the traced program stays uncluttered.
Page 01 of 3
Method for improving the accuracy and efficiency of exception based instruction tracing Instruction tracing can use exceptions in order to stop at every branch in the code, and gather relevant data to construct the program flow within the branch. An exception can occur at every branch if the MSR_BE (Branch Enable) bit is enabled. The MSR_SE (Step Enable) bit can be used to stop at every step as well.
1) While running an exception based instruction tracing tool (such as AIX ctrace), the cycles per instruction parameter (or CPI) is used in order to compute the total number of processor cycles that were consumed to complete a block of instructions. In branch mode tracing, this block is the set of instructions executed between two consecutive exceptions. The number of cycles consumed to complete the block of instructions = number of instructions in the block / CPI.
However, the above assumes that the number of cycles required to complete an instruction is constant, which is not true. There can be a wide variation in the number of cycles required to complete a load / store instruction for example, when compared to a add / rot instruction. Therefore, the number of cycles computed to complete the block of instructions is not accurate enough.
The above problem holds true even for step mode tracing where an exception is taken at each instruction that is executed, the only difference being that the instruction block comprise just one instruction in the case of step mode tracing.
2) While running an exception based instruction tracing tool, the rate at which (trace) exceptions occur (e.g. the time interval between two consecutive exceptions) is such that decrementer exceptions occur between two consecutive trace exceptions. The decrementer exception results in the clock interrupt being handled in the Operating System (AIX) code. As a result, the instruction flow captured by the tracing tool for a target program gets cluttered with the clock interrupt handling code, which leaves very little room for the instructions from the actual target program of interest.
Improving the accuracy of cycles computed to complete an instruction block:
In this context, the existing Performance Monitoring Unit (PMU) / Performance Monitoring Counters (PMC) facility was investigated.
The PMU today provides two methods of data collection:
This allows measuring the cycles for the instructions that are part of the selected sample.
However, sampling is available at only a minimum rate.
Sampling is available only upto a minimum of 1 ms for processor cycles. However, an instruction is in the range of 2-4 cycles which is far less than the minimum sampling rate available for the PMU.
For the purposes of the instruction tracing tool in the discussion, we need to be able to get the cycle count at the granularity of an instruction, which is not possible currently using the sampling method offered by the PMU.
PMU allows thres...