Browse Prior Art Database

Fixed latency syscall emulation layer for performance monitoring Disclosure Number: IPCOM000022503D
Original Publication Date: 2004-Mar-18
Included in the Prior Art Database: 2004-Mar-18
Document File: 3 page(s) / 46K

Publishing Venue



This article describes a method for increasing the determinism of the runtime performance of a system to enable more accurate performance measurements to be taken,

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 40% of the total text.

Page 1 of 3

Fixed latency syscall emulation layer for performance monitoring

We need to measure the performance of our software. There are two aspects to this -- we need to know how well our software performs so we can publish that information to our customers, and we need to know how well our software performs so that we can determine the effect code changes have.

    Unfortunately, performance measurements are affected by things other than the software under test. Services run by the OS, background processing of hardware interrupts, and many other things have an impact. The issue is not that these things add a large amount of time to our measurements, but rather that they add an unpredictable and often unmeasurable time to our measurements. This is an issue because in mature products it will typically only be possible to make small performance improvements, of say 1 or 2%, compared with which the delays introduced by the system are relatively large. In most cases, it is not possible to measure with any reasonable degree of certainty performance differences of less than 5%.

    It is not in general possible to identify and discount these delays caused by external events. However, we can ensure that the frequency of these events is minimised, which should enable performance differences of 1% to be measured with reasonable certainty.

    The majority of the unpredictable latencies come from the following factors: Cache hits and misses
System calls (context switches)

    The major influence on cache performance is process migration and process interleaving. If a process is kept running continuously on one CPU, it will be able to make the most of the cache, however if it is continuously migrated from one CPU to another, it will be continuously causing multiple cache misses. I estimate the effect of this to be as much as 20% based on tests carried out on a twin Athlon machine running Windows XP* -- a matrix multiplication program ran 20% faster when restricted to a single CPU.

    Process interleaving has an obvious effect -- replacing the current process with a new one will cause both to spend the early part of their timeslice repopulating the cache.

    Paging is a mechanism used by modern operating systems to allow processes to collectively use more main memory than the system has available. This involves saving pages of memory to disk and retrieving them when they are accessed. If an accessed page must be loaded from disk, a huge latency will be introduced, because a context switch must occur, and a lot of IO must happen before the process can resume execution.

    System calls have terrible latency issues. Each system call will require a minimum of two context switches in addition to the time required to perform the requested processing. They also typically cause huge amounts of cache pollution. Worse, on at least one popular operating system, the scheduler is re-run before the return from each system call, so it provides and extra point at which the running proce...