Browse Prior Art Database

Correlating Lookahead Branch Prediction to Instruction Fetching in a High Frequency Superscalar Microprocessor

IP.com Disclosure Number: IPCOM000236655D
Publication Date: 2014-May-07
Document File: 5 page(s) / 51K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a method to reduce delays in Instruction-Fetching (I-Fetching) in Lookahead Branch Prediction. The method incorporates a Prediction Marking Queue (PMQ) that queues-up Branch Target Buffer (BTB)/Branch History Table (BHT) prediction marking results that arrive before the instructions to which the predictions are correlated; thus, I-Fetching is not delayed while waiting for BTB results.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 20% of the total text.

Page 01 of 5

Correlating Lookahead Branch Prediction to Instruction Fetching in a High Frequency Superscalar Microprocessor

When using Lookahead Branch Prediction where Branch Prediction Lookups and Instruction Fetching (I-fetching) run independently, it is assumed that performance and efficiencies in I-fetching improve when the Branch Target Buffer (BTB) and/or Branch History Table (BHT) is "ahead". That is, the BTB and/or the BHT has already searched

where I-fetch is currently fetching and can "redirect" I-fetching as quickly as possible to fetch the new stream at the target of a predicted branch, after the branch itself is fetched. However, when the BTB lookup latency is significant, I-fetching only when the BTB/BHT result is known can hurt performance. This is because I-fetching is very early in the pipeline and is delayed when it must wait for the BTB result to be ready; possibly delayed many cycles in certain circumstances (e.g., shortly after when both the BTB and I-fetch start from the same point at the same time).

A better option is to have the branch marking point at which the BTB/BHT result is actually used, mark branches (e.g., to be guessed taken or not taken) to be as late as possible in the instruction pipeline. This minimizes the need to halt instruction processing in order to wait for BTB/BHT results to catch up to this point. In fact, the opposite is more common: the BTB/BHT results, and potentially multiple BTB/BHT results, are instead waiting for instructions to reach the branch marking point of the pipeline.

If the branch marking point in the pipeline is later than I-fetching, then a separate structure to buffer/queue the BTB/BHT results while those processes are waiting for instructions to reach the branch marking point is required. The solution must minimize any special requirements or general hampering of the throughput of BTB/BHT results or instruction delivery through this branch marking point. This last requirement poses special problems for high frequency, highly superscalar microprocessor pipelines.

The novel idea is to have a Prediction Marking Queue (PMQ) that queues-up BTB/BHT prediction marking results that arrive before the instructions to which the predictions are correlated. The PMQ must be linked to logic that identifies which instructions flowing through the branch marking point are correlated to which prediction marking entries in the PMQ in order to potentially perform proper branch marking for every instruction.

The embodiment allows multiple instructions (up to six) with:


• Any allowed to be branches

• Multiple instructions (up to two) as taken branches
• Instructions on multiple streams (up to two) on every cycle for branch marking

The embodiment also supports simultaneous multithreading (SMT) with up to four

BTB/BHT result predictions to write the PMQ on every cycle with a maximum of two predictions per thread. The embodiment receives and tracks BTB/BHT prediction results on an OctWord basis wi...