Centralized, Comprehensive Power-Good Control for Multi-Enclosure SMP System
Publication Date: 2010-Apr-06
The IP.com Prior Art Database
Disclosed is a method for centralized power control for a multi-enclosure system that handles power fault determination and recovery across multiple power boundaries.
˄ ˙ ˇ ˝Ȉ ˝˛
Disclosed is a method for centralized, comprehensive power-good control for a multi-enclosure Symmetric Multi-Processor (SMP) system.
For system scalability, we have separately powered enclosures acting as one computer via a processor fabric interconnect. These enclosures consist of multiple electronic cards and/or card assemblies such as books or nodes. They are powered independently of each other and constitute an SMP system. The system monitor for the entire multi-power domain computer is in one of the enclosures. There may be a redundant system monitor in a second, separately powered enclosure to provide backup system monitoring/control coverage in case of failure of the primary monitor/controller. Each enclosure, as well as the entire system collection of enclosures, requires comprehensive monitoring of the state of Standby power and System power to determine when a power fault occurs and the appropriate recovery action to take. This is not readily achieved across multiple power boundaries and interconnections. The robust new methods for system recovery described here result from the experience with problems encountered by the system control structure (system monitor(s) and enclosure monitor(s)) when confronted with external power line disturbance and power faults.
Prior to this, the system recovery code had to take a 'best guess' at why function was suddenly lost, i.e., was the problem power related or was there a chip or interconnect defect? Choosing the wrong reason for a temporary system fail can result in replacing hardware that is not failing, resulting in customer down-time and warranty expense. Considerable resource was devoted to debugging Power Line Disturbance (PLD) test results and devising recovery scenarios each time a new set of PLD circumstances was encountered. This was inefficient and prone to errors.
The architecture protocols of this invention are described in the text below. At a minimum, there is one enclosure monitor per enclosure. There may be a system monitor as well as an enclosure monitor within a given physical enclosure.
Note that the enclosure and system
˄ ˙ ˇ ˝Ȉ ˝˛
˄ ˙ ˇ ˝Ȉ ˝˛ ˄ ˙ ˇ ˝Ȉ ˝˛
monitors are on independent power domains.
A typical embodiment of the method is as follows :
A four-enclosure SMP system has an enclosure monitor in each enclosure and redundant system monitors in two of the four enclosures. Three critical components to the operation of this system's power control are as follows :
1. The concept and method for servicing one enclosure without disturbing the operation of the remainder of the system, i.e., concurrent maintenance. This requires identifying to the system control structure that a Concurrent Maintenance (CM) operation will be performed on a particular enclosure, disabling that enclosure's reset to the system control structure while the CM operation is in progress, then re-enabling system controls for the re-i...