Tracing Hypervisor Activity
Original Publication Date: 2003-Jun-18
Included in the Prior Art Database: 2003-Jun-18
Operating system tracing, based on strategically located trace hooks throughout a kernel, including profile sampling trace hooks, are a primary means of OS performance tuning. The recent addition of logical partitioning (LPAR) introduced a new layer of code between the hardware and OS: the hypervisor. Unfortunately, when the machine is executing hypervisor code, ambiguities arise which make it impossible to distinguish hypervisor and real mode addresses in the trace/profile data. Profile samples within the hypervisor and during real mode (exception handling, etc) are not distinguishable and make it difficult to impossible to do performance analysis in either of these areas. As logical partitioning evolves and becomes more complex, it is increasingly important to be able to collect accurate profile information to allow performance analysis and tuning of the hypervisor s/w layer.
Tracing Hypervisor Activity
This disclosure couples the hardware performance monitor (PM) event based sampling with the processor timebase (TB) register and discloses a technique for distinguishing hypervisor from real mode activity thus allowing performance analysis and tuning in both of these modes of processor execution.
A solution to this problem would be to place entry and exit trace hooks around the code which switches to hypervisor mode; any profile samples occurring during these entry and exit trace hook markers would clearly be due to hypervisor activity and could be categorized appropriately. However, hypervisor calls include virtual memory management (vmm) activity and tracing during this time, when the machine state is unsafe due to vmm state changes, is rarely possible.
Another solution would be to save hypervisor entry and exit timestamps to a small, per cpu, kernel pinned (bolted), vmm safe, ring buffer whereby a timestamped PM profile sample could be in-bounds checked for each entry in the ring buffer and appropriately categorized based on the result of this check (ie, if the profile timestamp is between any of the hypervisor entry:exit ring buffer pairs, then we know the profile sample was taken during hypervisor mode).
However, another problem arises: h/w PM based event sampling captures (samples) the current processor IAR and saves it to the SIA and a PM exception is signaled. Typically many processor cycles may expire before the PM exception is serviced especially when interrupts are disabled. The SIA is not time stamped and when the PM exception handler is called and reads the SIA, it does not know precisely when the sample occurred. You cannot, therefore, resolve real/hypervisor mode ambiguities between PM samples using either a trace based or event ring buffer unless you can somehow timestamp the SIA samples.
The solution presented here is as follows:
The PM h/w is programmed to count processor cycles (the typical event used for PM event based sampling/profiling); the PM exception handler reloads the PMC, which is counting processor cycles, with N and the very next instruc...