Vigil: Segregating the SMI by Node
Original Publication Date: 2003-May-07
Included in the Prior Art Database: 2003-May-07
Multinode systems such as the IBM xSeries eServer x440 are designed for massive redundancy and power. However, due to current design and defaults in the multinode system, is the way that System Management Interrupt events are processed. For example, if we were to have four (4) nodes connected with redundant memory on each node as well as a full array of processors on each node (eight (8) physical per node) when an error occurred on one of the nodes that generated an SMI, all of the interconnected nodes would enter the SMI. If the error was catastrophic, the SMI Handler would generate a machine check - forcing a reboot of all of the nodes. However, the causing event may not affect the other nodes and, if the node is not affected, the reboot is not required. This will impact the overall system throughput. What is needed is a way to isolate the errors and SMIs by node in an efficient manner. This invention takes advantage of the current scalability chip architecture to isolate the System Management Interrupt from being propagated from one node to another. By preventing the SMI from being propagated, the individual nodes can handle the SMIs independently and not impacting the performance of the non-affected nodes. From the operating system perspective, if the system is a fully loaded two node system (8 processor per node) and an SMI is generated by the one node due to an error condition, those 8 processors will go into the SMI while the other 8 processors continue to handle their tasks. This allows for SMI handler to perform system independent error correction without halting the entire multinode system. However, if the SMI is not propagated to all nodes, a method is required to involve other nodes if the original node determines that the condition requires other nodes to recover from the condition.