Grouping of Static and Predictable Conditional Branches
Publication Date: 2010-Sep-14
The IP.com Prior Art Database
All modern micro-processors have Dynamic Branch Predictors (DBPs) that consist of Branch History Tables (BHTs). It is possible that more than one active branch is mapped to the same entry. The large size of the BHTs (thousands of entries) should alleviate this problem. However, large scale enterprise applications can exhaust even these resources. Our solution groups together easy to predict branches and reduces the number of entries used. Thus, harder to predict branches have more entries available.
All modern micro-processors
have Dynamic Branch Predictors (DBPs). The DBPs predict both direction (taken/not taken) and branch target.Most are constructed from a series of Branch History Tables (BHTs) that contain the outcomes of previous predictions of the same branch.
Most DBPs are tagless i.e. the current Instruction Pointer (IP) or some combination of it and the Global Branch History (GBH) are used to index the BHTs but no test is made to certify that the prediction in the table is for the same branch that is predicted. Thus, it is possible that more than one active branch is mapped to the same entry when their LSBs are the same.
The large size of the BHTs (thousands of entries) should alleviate this problem. However, large scale enterprise applications (DBs, ERPs) can exhaust even these resources. Therefore, a solution to this problem has to be made at the software level.
The solution we present here is in the realm of processors that have no static branch prediction capabilities or where they exist but they still occupy BHT slots. The idea is to group together branches that are either predicted statically or have an high degree of probability of being predicted correctly.
All branches that are predicted taken
(statically or with a high degree of probability) are positioned at addresses that will map them to the same BHT slot. For example, for a BHT of 64 entries addresses that end with 0x14
will be mapped to the same
slot. Thus, the compiler or post-link optimizer can try to map the aforementioned branches to the same slot. This solution will work for a small BHT, for larger BHTs several slots will be reserved for these branches.
Current solutions include:
Besides the obvious enlarging of the tables there are several solutions that split
the BHT into several smaller BHTs in order to have each handle a different "class" of branches.
All these solutions are hardware based.
Re-order some of the branches to minimize the collisions of branches. This
necessitates that the compiler or other (post-link) optimizer keep track of all branches and integrate the position shifts with many other optimizations. In addition this solution doesn't reduce the number of entries used, it just resolves conflicts.
Convert conditional branches to un-conditional ones. However, this approach is
less effective as it is usually done anyway by the compiler when possible. Trying to force additional conversions may add, instead of reduce, the number of branches.
Static branch-prediction - many architectures enable the compiler to specify (by
setting bits in the instruction) that the conditional branch should be statically predicted taken based on program and/or profile based analysis. For some processors this is an advantage as the statically predicted branches aren't inserted into the BHTs, making more room for dynamically predicted branches. However, in other processors, they still occupy BHT slots so the possibility of branch collision still exis...