P9NEST, - FB06 - Method of Managing Coherency in a 3-Hop Topology
Publication Date: 2017-May-15
The IP.com Prior Art Database
Method of Managing Coherency in a 3-Hop TopologyABSTRACT
Disclosed is a method of managing coherency in a 3-hop topology.
The POWER* processors implement a non-blocking snooping protocol. This enables scaling
ever-larger n-way SMP systems which are typically limited by message queuing depth and
limitations in coherency bandwidth in message passing snooping-based coherency protocols. In
non-blocking snooping protocols, caching agents’ requests are temporarily bounded. When a
request is broadcast, it has a guaranteed fixed time in which all snoopers respond. Once a request
is placed on the coherency network, there is essentially no queuing. This facilitates running the
coherency network at very high utilization. Therefore, increasing the overall network bandwidth
has a direct effect on the system capacity to do work.
Very large 3-hop n-way SMP systems are especially difficult to manage for non-blocking
snooping protocols. Since queuing facilities are minimal, the coherency network must divide the
available coherency bandwidth evenly amongst the requesters. Furthermore, not all chips in the
system are the target of coherency broadcasts, intentionally due to selective broadcast or
unintentionally due to overcommits. Once requests are placed on the coherency network, each
request must keep track of which chip in the system it was broadcast. The snooper partial
responses must be returned in the exact reverse order. Finally, the combined responses must be
broadcast to the same chips in the system as the original requests in the same order, as well. The
problem and subject of this disclosure is how to manage the coherency broadcast and necessary
The POWER9* 3-hop topology includes fully connected chips in a group via external SMP X-
buses (intra-group). The groups are fully interconnected to other groups via external SMP A-
buses (inter-group). However, each chip in a group is not fully interconnected with each chip in a
remote group. Only one chip within each group connects to a remote group. The POWER9
processor designates each stage in the coherency broadcast based on the position within the 3-
hop topology. The requesting caching agent is designated to be on the Local Master (LM) chip.
The chip that connects the local group to a remote group is designated to be the Local Hub (LH).
The remote group receiving chip is designated as the Remote Hub (RH). Finally, the remote
chip(s) on the remote group are designated as the Remote Leaf (RL).
The LH is equally allocated amongst all the chips on the local group. In the case of a 4-chip
group, each LM would have 25% of the available coherency bandwidth to issue requests beyond
the local group. The LH includes a tracking structure that keeps track of the originating LM, as
well as the X-bus it is connected. The LH tracking structure (SPLH presp FIFO) also keeps track
of which A-bus the command is broadcast. This could be all A-buses or none of the A-buses and
depends on the scope and...