Browse Prior Art Database

Method and Apparatus to detect fan errors reliably on a defective fan control logic Disclosure Number: IPCOM000016161D
Original Publication Date: 2002-Nov-08
Included in the Prior Art Database: 2003-Jun-21

Publishing Venue



Disclosed is a method and apparatus for detecting fan errors reliably on a data processing system which has a defective fan control logic circuit. In the current scenario, fan errors are detected successfully if the fan control logic is working correctly. Fan control logic is a hardware circuit which makes fan run at a particular speed for cooling purpose in a data processing system. That means fans run at one speed with out any rapid fluctuations in the speed. A micro processor called service processor then determines fan errors. In order to do so service processor reads current speeds of all fans from fan sensors, checks whether a given fan has dropped below a predefined threshold value. Based on that service processor takes appropriate action of logging errors. This method, however, does not correctly identify fan errors if the fan speed is changing randomly due to a defective fan control logic. Incase fan control logic is defective there can be random voltage drops to fan and fan can keep on changing its speed rapidly. That makes it difficult to detect fan errors with existing methods. In the proposed method service processor reads current speed of a fan from fan sensor. It then checks whether or not speed for a given fan, in the data processing system, has dropped below a predefined threshold value. If a fan is running at normal speed then service processor puts a good mark on it. Incase fan is running below the threshold speed then service processor puts a bad mark for counting sequence of that fan. Service processor reads four consecutive fan speeds before marking a fan count as good or bad. All the four consecutive values must be below threshold value for a bad count and above threshold value for good count. In case any of the four consecutive fan speed readings are mixed, it is not counted as a valid counting sequence and new counting is started. Those four consecutive bad fan speed readings make one bad count for a fan and such five bad counts mark a fan bad and logs the error. If all the four consecutive fan readings are above threshold, decrement the bad fan count otherwise if all the four consecutive fan readings are below threshold increment bad fan count. A bad fan count is not decremented below zero. This mechanism ensures that even if fan speed is fluctuating randomly for fans, only a real bad fan will be marked as bad. After detecting a bad count five times on the same fan, errors are logged and appropriate action is taken. Further to that, if a zero speed is detected for a fan then that is counted as two bad readings, out of four consecutive readings. Thus boosting stop fan error detecting twofold faster than a slow fan. Hence it takes 4 x 5 20 error counting for a slow fan and 2 x 5 =10 for a stopped fan, in a fluctuating speed scenario with a defective fan control logic. The same method is repeated n number of times where n is total number of fans in the data processing system. This is to cover all the fans (0 to n-1) for error checking. This algorithm is explained in the following figure. 1 2