Browse Prior Art Database

Technique for Speculatively Sampling Performance Parameters

IP.com Disclosure Number: IPCOM000113721D
Original Publication Date: 1994-Sep-01
Included in the Prior Art Database: 2005-Mar-27
Document File: 4 page(s) / 193K

Publishing Venue

IBM

Related People

Dwyer, H: AUTHOR [+4]

Abstract

In multiprocessor systems using multiple caches whose coherency is managed by hardware, it is important to be able to provide information concerning the migration of cache lines from processor to processor. Measurements of existing multiprocessor systems show that cache line migration represents a large portion of the memory data traffic. Thus, reduction of cache line migration will reduce a large portion of the memory delay imposed on the processors of a multiprocessor system, thereby increasing system performance

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 29% of the total text.

Technique for Speculatively Sampling Performance Parameters

      In multiprocessor systems using multiple caches whose coherency
is managed by hardware, it is important to be able to provide
information concerning the migration of cache lines from processor to
processor.  Measurements of existing multiprocessor systems show that
cache line migration represents a large portion of the memory data
traffic.  Thus, reduction of cache line migration will reduce a large
portion of the memory delay imposed on the processors of a
multiprocessor system, thereby increasing system performance

      To take advantage of this opportunity, it is necessary to
provide software developers with information concerning the cache
line migration.  A very useful form for such data is a histogram
where the X-axis represents addresses of shared data structures and
the Y-axis represents the frequency of data cache line migrations.  A
similar histogram, where the X-axis represents instruction addresses
and the Y-axis is the frequency with which the instruction at that
address causes a data cache line migration, is also useful.

      Similar histograms for the case that there is no migration but
instead accesses of the next levels of the memory hierarchy can also
be constructed.  For example, a histogram of data-effective addresses
versus frequency of actual physical memory accesses is also useful to
a software developer.  If the correlation between data structure
access and instruction access is known, then a software developer can
understand what instructions (and therefore software function) are
causing memory hierarchy delays (i.e., are "hot spots").

      Knowing what code/data areas are  "hot spots" is the first and
most important data to be collected to address the performance
problems caused by "hot spots".  There are various methods for
solution which include fragmentation of data structures that force
unnecessary sharing or by restructuring of algorithms.  The point is
that to rank the effort in eliminating or reducing the  "hot spots"
it is necessary to know what areas are  "hot spots" and the degree to
which these areas are "hot spots".

      Preferred Approach - For L2 cache misses, a mechanism is
provided which will generate an interrupt on every Nth L2 cache miss.
The hardware provides to the interrupt handler the effective address
of the instruction and operand being accessed.  The interrupt handler
saves this data as part of the service.  The interrupt handler must
also
save certain software state indicators (process or thread id, etc.).

      This data is collected and presented in a conventional form
with standard tools such as "aixtrace" [2]  or in a graphical form
with visualization programs such as "pv" [1].  Using the various
features of these tools, software developers can examine the data for
overall patterns and focus on details as appropriate.  Tools such as
"pv" are especially useful in MP systems...