Browse Prior Art Database

Parallel Arbitration For Crossbar Switches

IP.com Disclosure Number: IPCOM000132521D
Original Publication Date: 2005-Dec-20
Included in the Prior Art Database: 2005-Dec-20
Document File: 9 page(s) / 122K

Publishing Venue

IBM

Abstract

This article proposes a novel parallel matching method applied in crossbar-based packet-switching devices. It allows high-quality matchings to be computed over multiple time slots and to be issued in every time slot. The method improves on the existing pipelined implementations by eliminating communication overhead between pipelining stages. As a result, it achieves lower pipeline latency, can complete more matching iterations over a given time period, and is more readily implemented in fast hardware.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 19% of the total text.

Page 1 of 9

Parallel Arbitration For Crossbar Switches

In practice, arbiters for packet switches with unbuffered crossbars typically employ iterative bipartite graph matching algorithms. Especially when the line rate is high, the packets are short, and the number of ports
(N) is large, it can be the case that the number of iterations that are required to achieve good performance times the duration of one iteration exceeds the packet duration. In such a case, it is beneficial to pipeline the matching process to obtain one high-quality matching in every time slot. In C. Minkenberg, I. Iliadis, F. Abel, 'Low-latency pipelined crossbar arbitration', in Proc. GLOBECOM 2004, Dallas, TX, Nov. 29-Dec. 12, 2004, such a method for pipelined computation of a bipartite graph matching in an arbitrated crossbar packet switch is described, also referred to Fast Low-latency Parallel Pipelined aRbitration (FLPPR).

The pipeline comprises a number of stages (K), each stage receiving requests for transmission. A request corresponds to one or more packets being present for a specific input-output combination. The computation comprises multiple steps. In one step, every stage computes a new matching based on the requests it received and the result of the previous stage. The resulting matching of a given stage of the pipeline is communicated to the next stage of the pipeline before the next step.

The key feature of that method is that requests can be submitted in parallel to one or more stages of the pipeline, whereas in the prior art every request enters at the start of the pipeline and a corresponding grant cannot be issued before it has traversed all stages of the pipeline, whereas in the approach proposed by Minkenberg et al. a grant can be issued after one step, thus significantly reducing the pipelining latency.

A drawback of that method is that every stage of the pipeline needs to send and receive (except the first) a full matching at every step. When the switch degree is large, this communication overhead leads to significant implementation limitations, and incurs additional latency and costs in terms of pins, wires, and chip area.

The method proposed here implements the aforementioned pipelining scheme in such a way that the communication overhead is halved whereas the behavior of the scheme remains provably identical (thus preserving all the advantageous properties of the former).

The method eliminates the need to transmit matchings from stage to stage. This is achieved by - adding a multiplexer to select one out of K matchings to send to the line cards; this matching has completed all steps of the computation. - adding a pointer to the current (logical) head of the pipeline that is incremented by one (modulo K) after every step.

Figure 1 compares the high-level block diagram of a pipelined matching process as proposed in the prior art (also referred to as FLPPR v1), with the parallel implementation of the method proposed here (also referred to as FLPPR v...