Browse Prior Art Database

Subsystem Thermal Recovery Within a Managed Enclosure

IP.com Disclosure Number: IPCOM000127243D
Original Publication Date: 2005-Aug-18
Included in the Prior Art Database: 2005-Aug-18
Document File: 4 page(s) / 378K

Publishing Venue

IBM

Abstract

Disclosed is a system that provides thermal recovery within a managed chassis enclosure environment. When a subsystem within the chassis enclosure uses software based thermal detection and the main CPU controlling that software is powered off, the subsystem can no longer provide status to the chassis management on it's thermal condition. This system will provide a mechanism that the subsystem can use to provide an indication to the chassis management system as to when power can be re-applied without fear of damaging components or running outside the thermal limits of the subsystem.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 4

Subsystem Thermal Recovery Within a Managed Enclosure

     Within an integrated managed chassis enclosure, the subsystem components typically have 2 distinct power domains, power domain 1 (PD1) and power domain 2 (PD2). PD1contains interface hardware that allows the Management system for the chassis to communicate with the subsystem. Also, the subsystem interface HW is in the same power domain as the Management system itself. The second power domain contains the majority of the subsystem's circuitry which includes the CPU for the subsystem and the heat producing devices.

     When thermal detection is implemented in software running on the subsystem CPU in PD2, it is necessary to provide power to the CPU in order to continue to obtain thermal measurements. In an integrated chassis, the subsystem CPU is typically on PD2 and therefore cannot provide thermal status when the second power domain (PD2) is turned off. However, when a shutdown thermal condition is detected on a subsystem, the corrective action is to power off the high power devices, which includes the subsystem CPU and prevents the components from running outside their operating range. Powering down this power domain removes the voltage to the ASICs that generate heat allowing the devices to cool down. Hence, once a thermal overtemp condition has been identified on the subsystem, the corrective action is to power off PD2 and take the subsystem component out of service. However that will also eliminate the possibility of detecting that the temperature has decreased to within operating conditions when it is desired to put the subsystem component back into service. This period can be very short when autonomic control systems are used which attempt to immediately put the switch back into service. Since PD2 is off, the current temperature is not available therefore; the chassis enclosure management system will not be able to determine how long the subsystem should be powered down in order to sufficiently cool down the hardware.

     This system provides a mechanism for an integrated subsystem to recover from a sensor condition such as a thermal overtemp condition, when the detection mechanism is via logic or software running in a power domain that may be turned off. The mechanism provides a thermal overtemp condition to the chassis management system even when the subsystem CPU is powered off. In addition, the mechanism will allow the thermal overtemp condition to be deasserted once an elapsed period of time has expired. All of this is conducted without the use of the subsystem CPU which is powered down. This will then allow the chassis management system to turn...