Browse Prior Art Database

Compare/Load Unit

IP.com Disclosure Number: IPCOM000113995D
Original Publication Date: 1994-Oct-01
Included in the Prior Art Database: 2005-Mar-27
Document File: 2 page(s) / 55K

Publishing Venue

IBM

Related People

Muhich, JS: AUTHOR [+2]

Abstract

The following is a description of an execution unit which can be used to increase performance on silicon-bound processors. Two instructions which are typically performance bottlenecks in low-end processors are load and compare instructions. Load instructions usually have at least one delay slot (assuming a cache hit) and account for 20-30% of the instructions in many typical applications (e.g., the SPECmark suite). Compare instructions produce results which are used by the branch processor to determine control flow. Since control flow decisions are made at the fetch stage, compare instructions effectively typically have two or three delay slots (depending on the number of cycles between fetch and execute).

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 66% of the total text.

Compare/Load Unit

      The following is a description of an execution unit which can
be used to increase performance on silicon-bound processors.  Two
instructions which are typically performance bottlenecks in low-end
processors are load and compare instructions.  Load instructions
usually have at least one delay slot (assuming a cache hit) and
account for 20-30% of the instructions in many typical applications
(e.g., the SPECmark suite).  Compare instructions produce results
which are used by the branch processor to determine control flow.
Since control flow decisions are made at the fetch stage, compare
instructions effectively typically have two or three delay slots
(depending on the number of cycles between fetch and execute).

      We propose early execution of load and compare instructions in
an execution unit which looks ahead for these instructions and
attempts to pre-process them.  Out-of-order execution requires
expensive control and data flow circuits and is, therefore,
unattractive to low-end silicon-bound designs.

      Instead of removing the load and compare instructions from the
instructions stream, we simply copy these instructions to the
compare/load unit where (barring data dependencies) their results are
calculated and can be forwarded to their dependent instructions.
Since the compare and load results are determined earlier, the number
of delay slots associated with each is reduced.  The execution unit
is shown in the Figure.

   ...