Browse Prior Art Database

Zero-Cycle Branches in Simple RISC Designs

IP.com Disclosure Number: IPCOM000120079D
Original Publication Date: 1991-Mar-01
Included in the Prior Art Database: 2005-Apr-02
Document File: 7 page(s) / 189K

Publishing Venue

IBM

Related People

Grohoski, GF: AUTHOR

Abstract

Reducing the pipeline delay caused by branch instructions has been one of the fundamental problems of computer design. Consequently, there are many approaches to attacking this problem.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Zero-Cycle Branches in Simple RISC Designs

      Reducing the pipeline delay caused by branch instructions
has been one of the fundamental problems of computer design.
Consequently, there are many approaches to attacking this problem.

      In the RISC System/6000* the approach has been to provide a
separate branch execution dataflow which executes branches without
interrupting with or using standard fixed-point instruction execution
resources.  The branch execution unit attempts to make branches all
but invisible to the fixed-point and floating-point execution units,
hence the term "zero-cycle" branch.

      While the RISC System/6000 can dispatch up to four instructions
per cycle, simpler design points are possible which dispatch fewer
instructions per cycle yet are capable of overlapping the execution
of branch instructions with fixed- and floating-point instructions.

      The required control logic appears to be small and it provides
approximately a 0.2 to 0.3 cycle per instruction gain over designs
which require the fixed-point unit to execute branches directly.

      Fig. 1 depicts some salient features of a typical
single-instruction issue RISC processor design.  Illustrated are
instruction buffers, their associated address registers, and an
interface to the fixed-point execution unit.  There is also a
separate branch address adder, which generates relative-displacement
branch target addresses during the decode cycle, so that the target
can be fetched from cache during the next cycle, if the branch is
taken.  Assuming that the target returns during the execution cycle
of the branch, it may be decoded during the next cycle and executed
one cycle later.

      If the branch is conditional and is not taken, the target is
discarded.  This unnecessary fetch may cause the instruction buffer
to become empty in certain cases, but this performance loss will be
ignored here.

      Thus, in this design, every branch requires...