Merged Branch Prediction Algorithm and Array Word Line Decode.
Original Publication Date: 2001-Apr-26
Included in the Prior Art Database: 2003-Jun-19
Disclosed is a method to improve performance of branch prediction circuitry by merging the prediction algorithm with the array word line decoders. This method does not require the branch prediction algorithm to produce a binary encoded value to address an array where branch prediction information is stored. In the new method the branch prediction algorithm is altered so that decoded array word lines are directly realized. As branch prediction algorithms become more complex, and processor speeds increase the need to do branch prediction more efficiently has become necessary. In a traditional super-scalar processors information used to do branch prediction is first gathered to a central point. This information is usually the program counter value of the branch instruction, a global history vector of past branch directions, and other processor status information. The other processor status may be the last data value loaded, condition register bits, or other processor registers. This information is then compacted using some algorithm into a binary value used to index into an array. There has been much academic literature on algorithms and processor status information used to produce this binary encode. The array then takes the binary encoded address and decodes it into word lines to access array elements. The array elements contain branch history patterns or saturating counters used to predict branch direction. It is desirable to be able to perform this operation in one processor cycle and not have this function gate processor cycle time. Fig. 1 shows a traditional branch prediction mechanism. In the example x bits of the program counter plus y bits of a global branch history register plus z bits of other processor status are used to generate a index of n bits in length. The algorithm used to produce the n bits Index(0:n)=f1(x,y,z) is not pertinent to the invention, but can be assumed to be implemented as a two level sum of products. Inside the array index(0:n) needs to be inverted and repowered so the word line decoders can select the correct array elements . This can be represented : WL(0:k)=f2(Index(0:n)) where f2 is a simple binary decode. This inversion, repower and decode will require at least two levels of logic. Fig. 2 shows an improved implementation were the binary compaction/expansion of the information used to drive the word line decoders is eliminated. Instead of having two separate pieces of logic implementing Index(0:n)=f1(x,y,z) then WL(0:k)=f2(index(0:n)) , a single sum of product is implement to produce WL(0:k)=f3(x,y,z) directly. Depending on the branch prediction algorithm used this can lead to logic level and area reduction and therefore improve performance.