Browse Prior Art Database

Power and Performance Improvements Using a Selective Branch Target Buffer Entry Replacement

IP.com Disclosure Number: IPCOM000146501D
Publication Date: 2007-Feb-13
Document File: 3 page(s) / 21K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a method that provides significant performance and power advantages on designs that use a branch target buffer (or a similar structure) to predict branches. Benefits include significantly improve performance over a Less Recently Used (LRU) based scheme.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 52% of the total text.

 

Power and Performance Improvements Using a Selective Branch Target Buffer Entry Replacement

Disclosed is a method that provides significant performance and power advantages on designs that use a branch target buffer (or a similar structure) to predict branches. Benefits include significantly improve performance over a Less Recently Used (LRU) based scheme.

Background

Currently, microprocessors use pipelining to process several instructions in parallel. Branch instructions, which are changes in control flow, can have an adverse impact on the performance of the pipeline if they are not predicted correctly. Branch behavior is governed by two elements: direction (taken/not taken) and target. Branch targets are either encoded in the instruction (i.e. directs, conditionals), implied (i.e. returns), or are determined non-statically by program variables (i.e. indirects via registers or memory). Different branch types have varied predictability and performance impacts on a given processor architecture. Branch target buffers (BTB) are used to store targets to predict branches early in the pipeline. A BTB is a fixed size structure and employs a scheme to choose which entry to replace when a new target is inserted. Typical replacement schemes include LRU, Pseudo Least Recently Used (p-LRU) or random. These schemes only consider the latency from the last use of an entry, and do not account for the relative importance of different branch types.

General Description

The disclosed method adds another option to the current state of the art; it uses branch type information from the new allocating branch and/or the branch types of the existing entries instead of (or in conjunction with) the LRU. It recognizes that, for a given architecture, not all branches are equal. An algorithm that accounts for the LRU, the branch types, and the number of times entries have avoided replacement is used for entry selection. This allows the BTB to retain branch types that are more important to predict accurately. This scheme makes the BTB more selective, and thereby allows a smaller BTB to perform, as well as a larger and/or more associative BTB.

For example, some processors use a BTB early in the pipeline, but are able to make another prediction after decoding the instruction for some branch types. This is done for all branches except indirect branches, where the decode stage does not provide target information. In this example, if a direct branch is predicted incorrectly by the BTB, it can be corrected a few stages later when the instruction i...