Browse Prior Art Database

Prefetching Branch Targets Via Data Bus

IP.com Disclosure Number: IPCOM000102226D
Original Publication Date: 1990-Nov-01
Included in the Prior Art Database: 2005-Mar-17
Document File: 2 page(s) / 71K

Publishing Venue

IBM

Related People

Hsu, PYT: AUTHOR [+2]

Abstract

A technique is described whereby a computer data bus is used instead of an instruction bus to prefetch branch targets from cache. This increases the probabilility that the first target fetch will contain the entire target instruction and improves performance because of the reduced breakage.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Prefetching Branch Targets Via Data Bus

       A technique is described whereby a computer data bus is
used instead of an instruction bus to prefetch branch targets from
cache.  This increases the probabilility that the first target fetch
will contain the entire target instruction and improves performance
because of the reduced breakage.

      Computer processors with a unified cache that contains both
instructions and data usually transfer both instructions and data
through a single, relatively wide bus. Sharing the cache and the bus
introduces contention between data accesses and instruction fetches,
resulting in performance degradation.

      In many cache organizations, the data-array bandwidth is much
higher than the bus bandwidth.  In the prior art, a block of
instructions is prefetched quickly out of the cache data array in a
single cycle and saved in a wide register within the cache module,
then transferred to the processor via a separate instruction bus at a
slower pace over several cycles.  This technique eliminates bus
contention between instruction fetches and data accesses and also
reduces cache access contention by lowering the frequency of
instruction-fetch accesses to the cache data array.

      For performance reasons, the data bus width is usually
determined by the largest frequently encountered data object, such as
a 64-bit floating-point number.  However, the typical instruction
length may be only 32 bits, and the average execution time of an
instruction may be several cycles.  This allows the transfer of an
entire instruction to be spread over several cycles.  As a result,
the dedicated instruction bus is much narrower than the dedicated
data bus and contributes to the cost effectiveness of using dedic...