Browse Prior Art Database

Reliability mechanism for control protocols in arbitrated crossbar switches

IP.com Disclosure Number: IPCOM000028816D
Original Publication Date: 2004-Jun-03
Included in the Prior Art Database: 2004-Jun-03
Document File: 7 page(s) / 109K

Publishing Venue

IBM

Abstract

We propose a reliability method to protect an imperfect (i.e., having an error rate greater than zero) control path of packet switches with input queues and a remote centralized arbitration unit (arbiter) to ensure that that the input queue state maintained by the arbiter is consistent with the actual input queue state, taking into account the round-trip time. This method has the following key advantages: - The method determines whether the queue state maintained by the arbiter is consistent with the actual queue states at the line cards. - The method also determines the magnitude of any inconsistency, thereby allowing immediate corrective action to be taken, if required. - The method can cope with control paths of arbitrary length and round-trip time, thus supporting systems of arbitrary physical size. - The method does not interfere with regular data traffic, i.e., while a check is in progress, data traffic can simultaneously proceed at full bandwidth. The method checks consistency periodically or in response to the detection of a physical error. - The method entails minimal overhead.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 20% of the total text.

Page 1 of 7

Reliability mechanism for control protocols in arbitrated crossbar switches

Background

This disclosure is a companion to [7], which proposes an efficient control protocol for arbitrated crossbar switches that enables distributed implementations without sacrificing performance. The background is identical.

The scheme proposed in [7] enables the line cards to connect directly to the routing fabric and the arbiter where the data path may be either electrical or optical without the need for additional buffering or control logic close to the switch core. In contrast, the scheme proposed in [2] requires a costly additional stage of buffering close to the switch core, which is especially unattractive when the switch core is optical.

However, the scheme [7] suffers from reliability issues. If a control message is corrupted as a result of a physical erroror any other error that corrupts the state of the protocol in any way the protocol may never recover, or recover only under very specific conditions. As this is not acceptable in a practical system, a mechanism to provide reliability and robustness with respect to errors is required. Specifically, referring to Fig. 2 of [7], it is desired to ascertain that the state of the VOQs 21 at the line cards 2 is consistent with the value of the request counters 61 in the arbiter 6.

The present idea addresses this reliability aspect by means of a census mechanism . In a more general context, this problem is also referred to as obtaining a snapshot [6] of a distributed system to determine whether its global state is consistent. The concept is related to the reliability mechanisms used in conjunction with hop-by-hop credit flow control, as proposed in [5]. The marker used there is akin to the census, in that it travels one complete round trip via the control loop. The main difference is that their scheme accounts for outstanding credits, whereas the presented scheme accounts for outstanding requests, which leads to different marker/census updating rules, as described below.

Summary of idea

Proposed is a reliability method to protect the control path of packet switches with input queues and a centralized arbitration unit (arbiter). This method has the following key advantages:
- The method determines whether the queue state maintained by the arbiter is consistent with the actual queue states at the line cards.

- The method also determines the magnitude of any inconsistency, thereby allowing immediate corrective action to be taken, if required.

- The method can cope with control paths of arbitrary length and round-trip time, thus supporting systems of arbitrary physical size.

- The method does not interfere with regular data traffic, i.e., while a check is in progress, data traffic can simultaneously proceed at full bandwidth. The method checks consistency periodically or in response to the detection of a physical error.

1

Page 2 of 7

- The method entails minimal overhead.

System description

See the syst...