Browse Prior Art Database

AN ALGORITHM FOR INSTRUCTION CACHE ARBITRATION AMONG MULTIPLE INSTRUCTION STREAMS

IP.com Disclosure Number: IPCOM000007999D
Original Publication Date: 1997-Mar-01
Included in the Prior Art Database: 2002-May-10
Document File: 4 page(s) / 198K

Publishing Venue

Motorola

Related People

Ralph McGarity: AUTHOR [+3]

Abstract

In processor designs utilizing branch prediction, imperfect branch prediction accuracy leads to per- formance degradation. The severity of the perfor- mance degradation depends on the prediction accu- racy and the penalty per misprediction. This paper presents an approach which focuses on reducing the penalty per misprediction to enhance processor per- formance.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 34% of the total text.

Page 1 of 4

MOTOROLA Technical Developments

AN ALGORITHM FOR INSTRUCTION CACHE ARBITRATION AMONG MULTIPLE INSTRUCTION STREAMS

by Ralph McGarity, Terence Potter and Tom Thomas

INTRODUCTION

  In processor designs utilizing branch prediction, imperfect branch prediction accuracy leads to per- formance degradation. The severity of the perfor- mance degradation depends on the prediction accu- racy and the penalty per misprediction. This paper presents an approach which focuses on reducing the penalty per misprediction to enhance processor per- formance.

   Reducing the misprediction penalty by using multiple instruction streams has been widely dis- cussed [Superscalar Microprocessor Design, Mike Johnson, pp. 69-711. This technique involves repli- cating machine resources in order to process both the taken and not-taken paths of a conditional branch. A significant drawback to this technique is the amount of hardware required to effect the penal- ty reduction. In particular, replicating the instruction cache, instruction decoders, and execution resources is prohibitively expensive. Less aggres- sive implementations of the multiple instruction streams technique restrict the resource replication to just the instruction cache and perhaps the instruc- tion decode logic.

  An alternative approach to supporting multiple instruction streams is to time multiplex use of exist- ing resources rather than replicating them for the alternate branch path(s). For instance, time multi- plexed use of a single one-ported instruction cache can support the fetching of multiple instruction streams into the processor core. While the time mul- tiplexing approach offers a reduced hardware cost in comparison to the hardware replication approach, it does not provide all the performance benefits of that approach. In the obvious implementation, when a conditional branch is encountered the predicted and alternate branch paths access the instruction cache on every-other cycle. In this scheme both the

predicted and alternate paths are processed at roughly one-half of the original machine's efficien- cy until the branch is resolved. This scheme has severe limitations, especially if the prediction mechanism is correct the vast majority of times.

THE PROPOSED ALGORITHM FOR INSTRUCTION CACHE ARBITRATION

  A variation on the instruction cache time multi- plexing approach involves using an algorithm to arbitrate between the fetch requests from the pre- dicted and one or more alternate paths. To maxi- mize performance, the algorithm proposed selects between the predicted and alternate path fetch requests based on a number of factors including pipeline interlocks downstream from the fetcher, the number of outstanding predicted branches currently in the machine, the presence of an instruction cache miss, and the presence of an instruction TLB miss.

  The algorithm detailed below describes when to dedicate one cycle to fetch the alternate path of the oldest branch in the processor instead of the predi...