Fast Failure Detection and Reconfiguration of a Multi-Blade Switching System
Original Publication Date: 2004-Oct-25
Included in the Prior Art Database: 2004-Oct-25
Related PeopleOther Related People:
A lot of network elements (NE) like IP (Internet Protocol) routers, MPLS (Multi Protocol Label Switching) switches, Ethernet Switches, SDH (Synchronous Digital Hierarchy) Add-Drop multiplexers or Radio Network Controllers consist of multiple blades plus a central control unit. These subcomponents are connected at the back plane via crossbar switches or other switching elements. The protocol depending packet processing is done at the processing element at the incoming blade. Part of the packet forwarding is the failure detection at packet, flow and link level. The node-internal reaction to a link or node failure (under certain circumstances) may take up to several seconds. This is mainly due to delays in the communication between the central control unit and the blades and between the blades themselves. Access to control buses, card polling mechanisms, up- and download of information between the control unit and the blade. During this time the processing element on the affected outgoing card (link break at the output) is not able to forward any further packet and thereforemight overflow the central queuing system of this card since the other cards still receive packets and forward them to this card because they did not receive information about the failure. The transfer of the packets after the restoration of the link additionally consumes link capacity without any use. Last but not least, most of the higher protocols have a timeout or the connection is already released after the restoration making higher layer intervention necessary. On the other hand, the event of a link or peer node failure itself is easily detected and very fast on the affected line cards: Physical loss of signal information or failure detection mechanisms included in a number of protocols like SDH (part of the SDH frame overhead indicates for instance a number of failures like the loss of multiframe). The only problem is the fast and reliable transfer of this failure information to the other port cards that have incoming traffic destined to this port card experiencing the failure at this output.