Browse Prior Art Database

Instruction Placement Method to Improve Cache Behavior

IP.com Disclosure Number: IPCOM000112656D
Original Publication Date: 1994-Jun-01
Included in the Prior Art Database: 2005-Mar-27
Document File: 2 page(s) / 119K

Publishing Venue

IBM

Related People

White, SW: AUTHOR

Abstract

Disclosed is an instruction placement method for reducing the number of instruction cache misses as well as the average penalty per cache miss. Both goals are achieved by arranging code blocks in a manner which considers cache line boundaries.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 46% of the total text.

Instruction Placement Method to Improve Cache Behavior

      Disclosed is an instruction placement method for reducing the
number of instruction cache misses as well as the average penalty per
cache miss.  Both goals are achieved by arranging code blocks in a
manner which considers cache line boundaries.

      Instruction cache misses substantially degrade performance for
many large codes.  Larger caches, second level caches, and faster
memory are well understood hardware solutions.  One common software
approach groups scattered, but frequently executed, code blocks.
Grouping frequently executed blocks improves the effectiveness of the
instruction cache by reducing thrashing.  The grouping is usually
based on execution-time profile information for the program.  This
type of approach attempts to minimize the number of instruction cache
misses.  The disclosed invention can further decrease the miss count
as well as decrease the miss penalty.  The disclosed invention can be
used separately or jointly with approaches which group frequently
executed (or "hot") code blocks.

      If the instruction working set is too large to fit in cache,
each instruction cache line reference may result in a cache miss.  If
the instruction footprint does not increase, minimizing the number of
cache lines referenced per code block minimizes the number of cache
misses.  Cache lines referenced can be minimized by starting each
code block on a cache line boundary.  Unfortunately, this simplistic
approach results in padding between the aligned "hot" blocks.  The
padding bytes are fetched but never used; therefore, the cache
becomes less fully utilized and the program's effective instruction
cache footprint grows.  Footprint growth may result in additional
capacity-related cache misses which degrade performance.

      The proposed algorithm rearranges blocks to minimize the number
of cache line references per block without increasing the footprint.
First, the overall program is aligned on a cache line boundary.  The
program is partitioned into moveable code blocks; initially, all
procedures are blocks.  Based on heuristics, blocks are subdivided,
possibly down to the basic block level.  (An extra branch might be
added if the code below a conditional branch becomes a separately
moveable code block.)

      Since minimizing cache line references equates to minimizing
cache line boundary crossings, blocks are reordered to either start
or end at a cache line boundary.  The blocks are categorized by their
code block length MODULO the cache line length.  (Let CBL be the
modulo value and CLL be the cache line length.) The first grouping of
blocks removes those blocks for which CBL=0; for the remaining
blocks, pairs are selected for which CBL(1)+CBL(2) = CLL.  Due to the
selection criteria, "CBL=0" code blocks and pairs can be rearranged
without affecting the alignment of adjacent blocks.  The few
remaining unpaired blocks are placed a...