Browse Prior Art Database

Method for Hardware Console Surveillance in a pSeries eServer Disclosure Number: IPCOM000014099D
Original Publication Date: 2001-Jul-23
Included in the Prior Art Database: 2003-Jun-19

Publishing Venue



A LPAR (Logical PARtition) system (in this case, a pSeries eServer) is managed by a separate standalone PC (aka Platform Management Console (PMC)). The PMC performs its tasks by sending commands to the CSP (Converged Service Processor) in the eServer system. The system administrator can configure the machine in LPAR or non-LPAR mode using the PMC. Via the PMC, he can assign system resources to partitions, handle events related to changes in partition state/status, etc.. The PMC can also run different applications other than LPAR System Mgmt. It can run applications that can be used to help service the LPARable system. For example, some service applications that run on the PMC communicate to the different partitions on the eServer system via the customer LAN. It is wanted that the CSP will know when the PMC stops communicating with the CSP so that the CSP can log an event to the partitions informing them that the PMC is no longer functioning. This type of function is called PMC surveillance. The CSP in the eServer system can monitor presence/performance of PMC by looking for "life-signs" or "heartbeat" from PMC. The PMC heartbeat can be an explicit PMC surveillance-command or any other valid PMC command to CSP. Receipt of a PMC heartbeat by CSP will reset and restart the PMC surveillance timer. If PMC fails to send CSP its heartbeat within a default or user-defined time limit, CSP will log an error, and will report it to the operating system and the user. This approach allows PMC to be disconnected from the LPAR system temporarily (for service, etc.), so long as it is re-connected before the next PMC heartbeat is due. The heartbeat frequency is determined by the PMC. The PMC will send down a command to the CSP indicating its heartbeat frequency. Normally, this is 1 minute. The normal mode of the PMC operation is the PMC will send commands to the CSP. If there is no work to be done at the CSP, then the PMC will not need to send commands down to the CSP. In this case, the PMC heartbeat timer will expire and the PMC will then issue a dummy command to the CSP to indicate to the CSP that the PMC is still alive. The CSP heartbeat timer should be longer than the heartbeat frequency value sent down by the PMC. The PMC is running on a non-real-time operating system, and will not be able to guarantee that it can send down a command exactly at the heartbeat frequency. This could be due to the fact that the PMC is very busy doing some higher priority work. The CSP will operate in the following manner to accomplish the PMC surveillance function. There will be a new timer created in the CSP. When the PMC sends down the heartbeat frequency (this indicates to the CSP that this PMC plans on doing heartbeats), the CSP will adjust this timer to sleep for this amount of time (the frequency is in seconds and a 0x00 frequency indicates that the PMC is no longer sending heartbeats). If any command is sent from the PMC(including the PMC dummy command sent to indicate the PMC is still alive), the parser will reset the new timer