Browse Prior Art Database

New Approach to Eliminate Branch Cost in Pipelined Super Computers

IP.com Disclosure Number: IPCOM000101760D
Original Publication Date: 1990-Aug-01
Included in the Prior Art Database: 2005-Mar-16
Document File: 5 page(s) / 159K

Publishing Venue

IBM

Related People

Karne, RK: AUTHOR [+2]

Abstract

One of the major requirements in designing a pipelined processor is to ensure steady flow of instructions to the initial stages of pipeline to achieve high performance. Such a flow is interrupted when branches are encountered. When a branch is encountered, the next instruction to be executed could be the branch target if the branch is successful, or the next sequential instruction if the branch is not successful. But the branch successful or not successful outcome is not known at the time instant subsequent to the branch instruction. So the instruction issue unit does not know whether to issue the "branch target" or the "successor" instruction to the pipeline. If we delay the issue until the time the branch success outcome is known, it results in stalling pipeline and performance suffers.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 42% of the total text.

New Approach to Eliminate Branch Cost in Pipelined Super Computers

       One of the major requirements in designing a pipelined
processor is to ensure steady flow of instructions to the initial
stages of pipeline to achieve high performance. Such a flow is
interrupted when branches are encountered. When a branch is
encountered, the next instruction to be executed could be the branch
target if the branch is successful, or the next sequential
instruction if the branch is not successful.  But the branch
successful or not successful outcome is not known at the time instant
subsequent to the branch instruction.  So the instruction issue unit
does not know whether to issue the "branch target" or the "successor"
instruction to the pipeline.  If we delay the issue until the time
the branch success outcome is known, it results in stalling pipeline
and performance suffers.  But if we gamble, the branch is going to be
successful and issue the branch target to the pipeline and after a
few cycles if the branch was successful as we gambled, then we are
home free, but if we learned subsequently that our gamble was
incorrect, it results in flushing the pipeline.  Pipeline stalling
also could result if the branch target is not available immediately
and needs to be fetched from memory.

      These two problems shall be henceforth referred to as pipeline
stalling and pipeline flushing.  Both these problems contribute to
performance degradation, since branches constitute anywhere from 15%
to 30% of typical machines.  On higher performance machines such
instructions consume a large fraction of time and the degradation in
performance is considerable.

      It is the object of the invention to provide a more efficient
organizational mechanism to solve the branch problem and reduce the
branch cost to zero, improve the performance of the CPU significantly
more than the existing methods could offer, use the pipeline
resources more efficiently, and provide flexibility to respond to
asynchronous events hampering performance in the pipeline super
computers.

      Fig. 1 details the block diagram of the proposed invention.
The processor has facilities to maintain several processes in
execution concurrently.  Each process has its own set of GPRs, PSW,
condition code register instruction counter, I buffer and any other
facilities required by the architecture in question.  The system
state facility SYS_STATE points to the currently active process.  The
instruction fetch unit is asynchronous and fetches instructions and
fills the I buffers of all the processes and keeps them ahead of
execution.

      The instruction issue logic picks up instructions from the I
buffers of the currently active process, decodes them, tags them with
the current SYS_STATE tag and issues them to the instruction decode
pipeline of the CPU.  From then on, the instruction execution is like
any normal CPU but for one difference.  The SYS_STATE tag accompanies
the...