Method for reducing communications in a clustered processor by means of occupancy-aware steering
Publication Date: 2005-Aug-17
Publishing Venue
The IP.com Prior Art Database
Abstract
Disclosed is a method for reducing communications in a clustered processor by means of occupancy-aware steering. Benefits include improved functionality and improved performance.
Method for reducing communications in a clustered processor by means of occupancy-aware steering
Disclosed is a method for reducing communications in a clustered processor by means of occupancy-aware steering. Benefits include improved functionality and improved performance.
Background
Clustered microarchitectures are a key paradigm for next generation processors because they effectively deal with some key challenges, such as wire delays, power density, and temperature distribution. One of the major drawbacks of these architectures is the cost of communicating values from one cluster to another cluster.
One of the key aspects for the performance of clustered processors is the latency for communicating register values among clusters. The steering engine (placed in the dispatch stage) is in charge of determining the destination cluster of each instruction. Typically, this engine attempts to minimize communications by steering an instruction to the cluster that holds most of the inputs. If the instruction cannot be steered to the most preferred cluster, the dispatch of instructions is stalled. This approach causes a ~30% performance degradation when compared to a method that probes the rest of the clusters. However, instructions can be steered to another cluster when the optimum cluster cannot receive any more instructions that cycle. In that case, the number of communications are increased because not-optimum clusters may not hold any of the inputs.
General description
The disclosed method includes a steering technique that reduces intercluster register communications. The eligible clusters are limited to those that minimize communications when the backend is almost full. Communication is reduced and performance is improved.
The key elements of the method include:
• Mechanism to determine the number of communications required when the instruction is steered to a particular cluster
• Mechanism to identify the load of the different queues and schedulers in the backends
• Clustered backends with schedulers and execution units
• Front-end with mechanism to identify if a cluster has a valid copy of each logical register
• Independent renaming of input instruction registers for each cluster, using the Register Rename Table
• Special COPY instruction for distributing register values among clusters
• Sorting of destination clusters for an instruction according to the steering criteria and stalling of the dispatch stage only when the backends are full and the destination cluster is not the preferred one
Advantages
The disclosed method provides advantages, including:
• Improved functionality due to providing the Register Rename Table
• Improved functionality due to providing the COPY instruction for distributing register values among clusters
• Improved performance due to reducing the total n...