Dynamic reconfiguration based on system health and cost
Original Publication Date: 2003-Sep-09
Included in the Prior Art Database: 2003-Sep-09
Given a workload running on a large NUMA system composed of nodes, or distributed across individual blades in a blade center, a mechanism is needed to monitor system health to predict system failure and/or high resource utilization. Current system health monitors look at individual components of system health, e.g amount of available memory, number of single bit errors in memory. This mechanism will combine both hardware and software health indicators to form a comprehensive node health indicator. If system failure is predicted from the system health, resources can be shifted from the failing node onto a node computed as being healthy. It this information could be used to shift the workload to less congested or less cost system with a health risk.