Browse Prior Art Database

HARDWARE RETURN STACK PERFORMANCE OPTIMIZATION

IP.com Disclosure Number: IPCOM000009920D
Original Publication Date: 2000-May-01
Included in the Prior Art Database: 2002-Sep-27
Document File: 3 page(s) / 156K

Publishing Venue

Motorola

Related People

Joe Circello: AUTHOR [+3]

Abstract

All architectures of the ColdFire family of 32bit embedded microprocessors implement a decoupled pipeline strategy, where the operation of the Instruction Fetch Pipeline (IFF) is decoupled from the Operand Execution Pipeline (OEP) through the use of a FlFO instruction buffer. This mechanism allows the IFP to prefetch instructions in advance of their actual use by the operand pipeline. In the Version 3 and Version 4 Instruction Fetch Pipelines, one stage is dedicated to performing time-critical decode functions on the prefetched instructions.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 40% of the total text.

MOTOROLA

Technical Developments

HARDWARE RETURN STACK PERFORMANCE OPTIMIZATION

by Joe Circello, David Schimke and Jeff Freeman

INTRODUCTION

All architectures of the ColdFire family of 32bit embedded microprocessors implement a decoupled pipeline strategy, where the operation of the Instruction Fetch Pipeline (IFF) is decoupled from the Operand Execution Pipeline (OEP) through the use of a FlFO instruction buffer. This mechanism allows the IFP to prefetch instructions in advance of their actual use by the operand pipeline. In the Version 3 and Version 4 Instruction Fetch Pipelines, one stage is dedicated to performing time-critical decode functions on the prefetched instructions.

The branch acceleration scheme makes use of these two key factors:

1) Since the IFP and OEP pipelines are decoupled, the IFP is generally prefetching in advance of the actual execution pipeline,

2) Since the Instruction Early Decode (IED) stage provides information on instruction length, the stream of prefetched data can be assembled into machine instructions at that stage of the IFF, allowing identification of change-of-flow instructions.

Essentially, this design accelerates the execution of branch instructions by allowing the IED stage of the Instruction Fetch Pipeline to calculate branch target addresses and to switch the prefetch stream to the new target address immediately.

For architectures like ColdFire, which use a memory stack for subroutine call and return linkage, implementations of a hardware return stack can accelerate the performance of return instructions. It is well-known that subroutine retums do not lend themselves to acceleration using traditional methods since as branch caches, since a single return instruction may transfer control to any number of different calling routines. A hardware return stack is the pre-

Motorola. Inc. 2000

ferred method to provide acceleration to this important class of instructions. Simply stated, the hardware retum stack functions as a LIFO (last-in, firstout) stack, where return addresses are pushed onto the stack whenever a call is executed, and popped off the stack whenever a return is executed.

In the Version 4 ColdFire implementatiou, the IFP acceleration logic scans the stream of prefetched instructions looking for subroutine call and return instructions. When a call is detected, the IFP calculates the return address and immediately pushes it into the hardware retum stack. Likewise, when a subroutine return is prefetched, the top of the hardware return stack is popped, and used as the target fetch address.

As a result of the decoupled instruction fetch and operand execution pipeline structure of all ColdFire processors, it is possible for the IFP to push/pop entries onto the hardware return stack that represent instructions that are not actually executed due to wrong-way conditional branches, exceptions, etc. To correctly handle these types of operations, the Operand Execution Pipeline maintains a "master copy" of t...