Browse Prior Art Database

Self Healing Server I/O Connectivity Design

IP.com Disclosure Number: IPCOM000124597D
Original Publication Date: 2005-Apr-29
Included in the Prior Art Database: 2005-Apr-29
Document File: 3 page(s) / 27K

Publishing Venue

IBM

Abstract

When CPU to I/O bus designs fail, typically a server is dead and must be restarted. Very few existing designs have a failover feature, and if they do, still require all buses functional to restart, and may require an expensive I/O box and cabling to support the feature. These fails do not result in graceful shutdowns and result in downtime until an unplanned repair action occurs. This design incorporates new features, an LPC bus attached to the North Bridge and redundant links between PCI Host Bridge Chips.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 79% of the total text.

Page 1 of 3

Self Healing Server I/O Connectivity Design

Three basic concepts are combined to provide a self-healing I/O attach design.
1) The links between PCI Host Bridge chips have End-To-End (ETE) failover capability.
2) BIOS load does not use the link from the North Bridge to the South Bridge/LPC bus.
3) The system does not support an external I/O box, so the PCI Host Bridge links can be completed internally.

Basic System block diagram:

1

Page 2 of 3

CPUs

System BIOS ROM

LPC bus

North Bridge Chip

Normal path

Failover path

Primary PCI Host Bridge Chip

Secondary PCI Host Bridge Chip

South Bridge

The details and scenario of the self-healing design: The system boots to the OS with no hardware problems. POST/BIOS has enabled the ETE failover capability in the PCI Host Bridge Chip link loop. If any of the links fails, the failover enabled hardware will switch ports and begin using alternate paths. For example, if the North Bridge to Primary PCI Host Bridge Chip link fails, the hardware will switch over to using the North Bridge to Secondary PCI Host Bridge Chip to Primary PCI Host Bridge Chip path. The user is notified of the failure via a message

2

Page 3 of 3

from the Service Processor which has been notified by the POST/BIOS SMI handler which was invoked during the link failure. The OS stays up and running, though. At the next OS shutdown and restart, the fetching of POST/BIOS code is not affected by the still broken link since this design has relocated the POST/BIOS ROM to th...