Browse Prior Art Database

User level writing to a pinned kernel buffer in an SMP system

IP.com Disclosure Number: IPCOM000013033D
Original Publication Date: 2000-Aug-01
Included in the Prior Art Database: 2003-Jun-12
Document File: 3 page(s) / 46K

Publishing Venue

IBM

Abstract

Disclosed is an algorithm that allows concurrent reading and writing of trace records to a single trace buffer without requiring a transition to supervisor mode in an SMP environment. There is a significant performance advantage to writing records without requiring supervisor mode. On NT*, the transition to kernel mode can take 1000-2000 cycles.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 3

User level writing to a pinned kernel buffer in an SMP system

Disclosed is an algorithm that allows concurrent reading and writing of trace records to a single trace buffer without requiring a transition to supervisor mode in an SMP environment. There is a significant performance advantage to writing records without requiring supervisor mode. On NT*, the transition to kernel mode can take 1000-2000 cycles.

This algorithm handles/avoids the following potential problems:
(I) Multiple processors attempting to write data to the same area concurrently
(II) Application record writes that are still in process when recording is completed (can occur with deadlocks and abended processes)
(III) Application writes that are written out of time sequence order
(IV) Writes of kernel records concurrent with application records
(V) Providing enough information in the user records to allow resolution of the information required for thread oriented call stack presentation (arcflow)

Solution: The buffer itself is pinned in order to support kernel trace records that do not allow paging and is mapped into user space for user space access. Trace records are written on 4 byte boundaries with a self-defined record format. The trace buffer is initialized at each 4 byte boundary with a recognizable invalid record identifier. Trace records are assigned some type of indicator to identify the trace record as being in process of being written. One way of doing this is to write the record with a "invalid indicator as part of the length field and to rewrite the length field at the completion of the write. Alternatively, each trace record could have a trailer byte which is used to indicate that the record has been fully written. Of course, in this case, the trailer must be a different value from the value initialized in the buffer. One way of implementing this is to initialize the entire buffer to a specific value, for example, 0xffffffff.

In order to support multiple processors writing to the same buffer, some type of semaphore must be used. The semaphore must be held while updating the length portion of the trace record. The record length must be correct during processing to get to the next trace record. Another semaphore must be used while updating the pointer to the next record.

Trace buffer: [header, record1, record2, ..., record n] header can contain the semaphore area, which is also the pointer to next record

On some machines, the getting of the record semaphore and the updating of the record length as the semaphore is acquired can be a single instruction. However, on other machines, the updating of the record length may require multiple steps.

We define an algorithm that assumes that the semaphore is implemented as a...