Browse Prior Art Database

Grouping of Static and Predictable Conditional Branches

IP.com Disclosure Number: IPCOM000199651D
Publication Date: 2010-Sep-14
Document File: 4 page(s) / 46K

Publishing Venue

The IP.com Prior Art Database

Abstract

All modern micro-processors have Dynamic Branch Predictors (DBPs) that consist of Branch History Tables (BHTs). It is possible that more than one active branch is mapped to the same entry. The large size of the BHTs (thousands of entries) should alleviate this problem. However, large scale enterprise applications can exhaust even these resources. Our solution groups together easy to predict branches and reduces the number of entries used. Thus, harder to predict branches have more entries available.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 51% of the total text.

Page 1 of 4

All modern micro-processors

                   have Dynamic Branch Predictors (DBPs). The DBPs predict both direction (taken/not taken) and branch target.Most are constructed from a series of Branch History Tables (BHTs) that contain the outcomes of previous predictions of the same branch.

Most DBPs are tagless i.e. the current Instruction Pointer (IP) or some combination of it and the Global Branch History (GBH) are used to index the BHTs but no test is made to certify that the prediction in the table is for the same branch that is predicted. Thus, it is possible that more than one active branch is mapped to the same entry when their LSBs are the same.

The large size of the BHTs (thousands of entries) should alleviate this problem. However, large scale enterprise applications (DBs, ERPs) can exhaust even these resources. Therefore, a solution to this problem has to be made at the software level.

    The solution we present here is in the realm of processors that have no static branch prediction capabilities or where they exist but they still occupy BHT slots. The idea is to group together branches that are either predicted statically or have an high degree of probability of being predicted correctly.

All branches that are predicted taken

                        (statically or with a high degree of probability) are positioned at addresses that will map them to the same BHT slot. For example, for a BHT of 64 entries addresses that end with 0x14

will be mapped to the same

slot. Thus, the compiler or post-link optimizer can try to map the aforementioned branches to the same slot. This solution will work for a small BHT, for larger BHTs several slots will be reserved for these branches.

Current solutions include:


Besides the obvious enlarging of the tables there are several solutions that split

1.

the BHT into several smaller BHTs in order to have each handle a different "class" of branches.

All these solutions are hardware based.

Re-order some of the branches to minimize the collisions of branches. This

necessitates that the compiler or other (post-link) optimizer keep track of all branches and integrate the position shifts with many other optimizations. In addition this solution doesn't reduce the number of entries used, it just resolves conflicts.

Convert conditional branches to un-conditional ones. However, this approach is


3.

less effective as it is usually done anyway by the compiler when possible. Trying to force additional conversions may add, instead of reduce, the number of branches.

Static branch-prediction - many architectures enable the compiler to specify (by

setting bits in the instruction) that the conditional branch should be statically predicted taken based on program and/or profile based analysis. For some processors this is an advantage as the statically predicted branches aren't inserted into the BHTs, making more room for dynamically predicted branches. However, in other processors, they still occupy BHT slots so the possibility of branch collision still exis...