Browse Prior Art Database

Architectural Support and On-Chip Hardware for Flexible System Performance Monitoring via Software

IP.com Disclosure Number: IPCOM000104164D
Original Publication Date: 1993-Mar-01
Included in the Prior Art Database: 2005-Mar-18
Document File: 4 page(s) / 130K

Publishing Venue

IBM

Related People

Bakoglu, HB: AUTHOR [+3]

Abstract

This article describes a flexible system performance monitoring method and apparatus. The described architectural features and the on-chip hardware enable monitoring of a large number of performance parameters and instruction statistics via software without requiring any external hardware. This method is applicable to a variety of computer types including microprocessors, minicomputers, mainframes and supercomputers.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 46% of the total text.

Architectural Support and On-Chip Hardware for Flexible System Performance Monitoring via Software

      This article describes a flexible system performance monitoring
method and apparatus.  The described architectural features and the
on-chip hardware enable monitoring of a large number of performance
parameters and instruction statistics via software without requiring
any external hardware.  This method is applicable to a variety of
computer types including microprocessors, minicomputers, mainframes
and supercomputers.

The scheme works as follows:

1.  Some CPU architectures (including IBM's RISC System/6000* POWER
    Architecture*) has Special-Purpose Registers (SPRs).  This
    article describes a method by which some of these SPRs can be
    used for performance monitoring.  One can define a separate set
    of Performance Monitor Registers (PMRS) if the architecture does
    not have any SPRs or sufficient number of extra SPRs.  The
    decision will be determined by the structure of the Instruction

    Set Architecture (ISA) and the available opcode points.  In this
    article these registers are referred to generically as PMRs.

    If there are no free special-purpose registers or space in the
    ISA opcode space, the PMRs can be placed in an on-chip test
    processor (COP) and can be programmed and read by scanning via an
    external support processor (ESP) rather MTSPR and MFSPR
    instructions.  This will avoid using the opcode space, but it
    will also have a more limited usage because it will require an
    external processor or a specially engineered version of the
    machine in order to be able to collect the performance data.
    Using the regular machine instructions to collect the performance
    data makes it much more flexible and enables any user to take
    advantage of it.

2.  To make this scheme as flexible as possible, the performance
    signals that are sampled (TLB miss, cache miss, load execution,
    etc.) should be left undefined.  The signals to be sampled can be
    determined during the implementation stage and different signals
    may be sampled in different implementations of the same
    architecture.

3.  Define one of the PMRs (say PMR0) as the control register, and
    define the rest of the PMRs (PMR1 through PMRn) as performance
    data count registers.

4.  The contents of the control register PMRO are decoded to
    determine which signals are sampled.  This gives a flexible way
    of sampling many signals without requiring a large number of
    performance count registers (PMR1-PMRn).  Hard coding the sampled
    signals to PMRs would require the number of count registers to be
    equal to the number of signals that are sampled.  This would
    require too many count registers.

5.  The 32-bit performance count registers (PMR1-PMRn) should be
    grouped as pairs of regi...