Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Mechanism to Reduce the Overhead of Hardware Based Profiling

IP.com Disclosure Number: IPCOM000241323D
Publication Date: 2015-Apr-17
Document File: 3 page(s) / 44K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a method to reduce polling overhead by dynamically changing the frequency for polling the Program Buffer occupancy based on the central processing unit (CPU) time an application thread spends. The solution works because the time to read the CPU time spent by a thread is up to four times less expensive than authorizing and de-authorizing a thread for Runtime Instrumentation (RI).

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 01 of 3

Mechanism to Reduce the Overhead of Hardware Based Profiling

Profile Guided Optimization (PGO) is used to boost the performance of software applications. Such a technique is especially powerful in the context of a Just-In-Time (JIT) compiler that can take optimization decisions at runtime, based on the observed program behavior. The main drawback is that profiling introduces a non-negligible runtime overhead, which, at least temporarily, affects the performance of the running application. To overcome this problem, companies directly incorporate profiling support into the central processing units (CPUs).

Runtime Instrumentation (RI) Facility is a hardware feature implemented in some processors that allows the hardware to collect profiling information with minimal overhead. In one set of processors, the user can program the RI Facility to collect certain data, after which the hardware automatically collects the data and stores it in sequence in a per-thread user buffer (i.e. the Program Buffer). When the Program Buffer is full, the RI hardware mechanism stops, but the system does not send a notification event to the software. Thus, the application code must periodically poll the status of the Program Buffer .

Polling the occupancy of the Program Buffer is an expensive operation. First, the system must de-authorize the thread from the RI point of view. Next, the system can retrieve the status of the Program Buffer. Finally, the system must reauthorize the thread for RI operation.

A method is needed to reduce this polling overhead.

One trivial solution is less frequent polling. The disadvantage of this solution is that a large polling period may lead to the Program Buffer becoming full; thus, some profiling information might be lost. Tuning the frequency of polling does not work because different threads may perform different amounts of work . Moreover, the activity of a thread may not be constant in time; sometimes it may execute a lot of code while some other times the thread may be mostly waiting for work .

The novel solution is a method to dynamically change the frequency for polling the Program Buffer occupancy based on the CPU time an application thread spends. The idea is that if a thread spends more time on a processor , then it is likely to produce more profiling data to fill the buffer. Conversely, if a thread spends less CPU time (e.g., because it is mostly idle waiting for work, is not scheduled by the operating system on any CPU), then it cannot produce a lot of data and the likelihood of filling the buffer is small . The solution works because the time to read the CPU time spent by a thread is up to four times less expensive than authorizing and de-authorizing a thread for RI.

1


Page 02 of 3

The algorithm follows:

1. Initi...