Browse Prior Art Database

A digital hardware circuit to calculate on the fly accumulated probability

IP.com Disclosure Number: IPCOM000127263D
Original Publication Date: 2005-Aug-19
Included in the Prior Art Database: 2005-Aug-19
Document File: 3 page(s) / 96K

Publishing Venue

IBM

Abstract

Presented is a platform circuit that enables better decision making by calculating on-the-fly accumulated probability.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 3

A digital hardware circuit to calculate on the fly accumulated probability

Presented is a platform circuit that enables better decision making by calculating on-the-fly accumulated probability. This article sticks to usages in a processor's fetch unit, although the same platform can be used elsewhere as well.

    The idea is to take each statistical input (e.g. branch confidence level) in its turn, translate it to probability P, and calculate the product of all the unresolved Ps. This number is the accumulative probability and hence represent the risk of misprediction in the examples presented.

    Multiplication is a costly circuit for fast decision making in processors, so since one is interested only in comparing the accumulative probability (e.g. in comparing risks of a thread to a threshold or different threads to each other), the "multiplication" is done via fast single cycle addition/subtraction operations.

Background for the fetch unit:

    Every program has branches, and virtually all processors perform speculation control and hence speculate each branch they encounter and act upon it. In recent years, branch confidence estimation algorithms have been researched, which tag each branch speculation with a level of confidence.

    Note that modern processors have multiple branches in their instruction window in parallel, but the confidence estimation is done separately per branch. Problem examples:

When to use pipeline gating, hence when is it best to gate in order to reduce


1.

power waste with minimal or no performance lost?

In SMT, when is it best to choose which thread to work on?


2.

Known solutions (to the 2 examples):

Don't use pipeline gating or SMT at all. This category was the majority, but is shrinking with time.

Use a single low-probability event to decide on the overall risk (risk=accumulative probability)

  Examples: Gating - "Assigning Confidence to Conditional Branch Predictions", by Erik Jacobsen, Eric Rotenbergf, and J. E. Smith SMT - "Branch Classification to Control Instruction Fetch in Simultaneous Multithreaded Architectures". by A. Ramirez, F. Latorre, J. Larriba, M. Valero For SMT - use Round Robin, ICOUNT or other algorithms, e.g. "A Low-Complexity, High-Performance Fetch Unit for Simultaneous Multithreading Processors" by Ayose Falc´on Alex Ramirez Mateo Valero

    Take the speculation control risk factor into account, Examples: Gating - "Pipeline Gating: Speculation Control For Energy Reduction", by Srilatha Manne Artur Klauser and Dirk Grunwald SMT - "Boosting SMT Performance by Speculation Control", by Kun Luo, Manoj Franklin, Shubhendu S. Mukherjee and Andre Sezne Drawbacks:

    The mentioned algorithm take into account either minimal statistic information or none whatsoever and by that achieve a sub-optimal decision. The last category achieves the best results, but allows only hard (binary) statistical events, hen...