Browse Prior Art Database

Masked Delayed Branches for Pipelined Computer Processors

IP.com Disclosure Number: IPCOM000102486D
Original Publication Date: 1990-Nov-01
Included in the Prior Art Database: 2005-Mar-17
Document File: 6 page(s) / 228K

Publishing Venue

IBM

Related People

Hsu, PYT: AUTHOR

Abstract

A technique is described whereby the use of masked delayed branches enables a compiler to schedule statistically useful subject instructions and allows pipelined execution of a sequence of branches so as to reduce the average delay for each branch operation.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 25% of the total text.

Masked Delayed Branches for Pipelined Computer Processors

       A technique is described whereby the use of masked
delayed branches enables a compiler to schedule statistically useful
subject instructions and allows pipelined execution of a sequence of
branches so as to reduce the average delay for each branch operation.

      Typically, delayed branch operations enable a pipelined
processor to execute instructions in parallel with a branch target
fetch.  The concept described herein implements a mechanism called
the masked delayed branch so as to selectively inhibit the execution
of delayed = branch subject instructions.  The masked delayed branch
operation is designed to provide the following functions:
     Enables a compiler to schedule statistically useful subject
instructions in situations where the use of normal delayed branches
would require the insertion of useless no-operation subject
instructions.
  Allows pipelined execution of a sequence of taken branches.
  Provides a simple implementation that is fast and compact.

      The delayed branch is a simple technique for reducing branch
penalty in pipelined microprocessors [1,2,3].  To exploit delayed
branches, a compiler must find suitable instructions that can be
moved into positions immediately following the branch instruction.
If suitable instructions cannot be found, then no-ops must be
inserted. Optimization of delayed branches is considered nontrivial,
and the insertion of no-ops is inevitable in practice [4].

      The use of guarded instructions has been proposed for reducing
the number of no-ops that must be inserted in highly concurrent
processors with long pipelines [5].  We describe a means by which
no-ops, which are never useful, are replaced with instructions that
may or may not be useful and to guard these instructions with
appropriate tests, such that only useful instructions are allowed to
execute.  The concept described herein provides a simple and
efficient implementation of guarded instructions.

      In focusing on a basic block B, terminating in a conditional
branch instruction b, if the branch is taken, control flows to the
branch target instruction, t+ . Otherwise, control falls-through to
the following instruction, t- .  If the conditional branch at
location i is replaced by a delayed branch of length n, then n
subject instructions immediately following b are unconditionally
executed.  The n instructions are in the delayed part of branch b.

      The problem is how to find useful instructions to move the
delayed part of branch b.  The best case is to move n instructions
from block B into the delayed part of branch b. These subject
instructions are always useful, since they would have been executed
prior to the branch.  However, it is not always possible to find n
movable instructions in B because of data dependency and/or small
basic block sizes.

      Instructions can also be moved from t+ and t- into the d...