Browse Prior Art Database

Method for Thermal Management in Server Systems Disclosure Number: IPCOM000238384D
Publication Date: 2014-Aug-21
Document File: 4 page(s) / 147K

Publishing Venue

The Prior Art Database


Disclosed is a method in the area of subsystem level thermal management of a server that aims to manage the thermal events preemptively in order to retain the target performance and/or optimize cooling energy

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 01 of 4

Method for Thermal Management in Server Systems

Thermal control method for server subsystems is feedback driven, which is typically implemented as a HW module running FW algorithms to execute thermal control loops. Thermal control modules monitor temperature of chip modules (processor, memory buffer, etc.) continuously, and cooling capacity is optimized based on the operational temperature dictated by predetermined threshold levels. Once maximum temperature limit of device(s) is reached, cooling energy needed to bring down the temperature back to nominal value is higher and, therefore, cooling energy is wasted in this process. DVFS techniques (throttling) to slow down the execution to bring down the temperature is traditionally used, which results in performance loss. Modules may be operatively at a higher temperature point (till the threshold point), and this results in higher chip leakage power.

    In feedback-based method, thermal control modules have the burden of polling the temperature data from all modules and then apply some control action. This needs HW complex infrastructure (thermal sensors, bus to poll sensor data) and also quite a processing capability to manipulate the data and execute the algorithms. Thermal control method based on subsystem performance data requires complex infrastructure. Performance/traffic statistics monitors, polling mechanism, and sophisticated temperature prediction algorithms are needed for implementation.

    Another important drawback is in a subsystem that has an array of modules (example: memory); worst case temperature limit of any memory module would drive the total cooling subsystem. Energy savings are sub-optimized, when only few modules are operating and temperature limit of those modules will drive the cooling capacity for all memory modules (and, therefore, energy wasted on memory modules which are not fully utilized). As depicted in Figures 1 and 2 below, the proposed method is as follows:

It proposes application-level code (or thermal management processes) to monitor for device temperatures, either by polling or interrupt driven mechanism and initiate some control actions before on the maximum temperature limit of devices.

It proposes thermal management processes to control the low-level parameters (cooling capacity, example) using the abstracted hardware static and runtime attributes (location of components, capacity, processor frequency of operation, operating energy mode, speed are examples) leveraging the encrypted keys, residing and maintained in the virtualization layer.

It suggests to group the thermal events to est...