Browse Prior Art Database

Method to Improve Memory DIMM Diagnose on Enterprise Server

IP.com Disclosure Number: IPCOM000235424D
Publication Date: 2014-Feb-26
Document File: 4 page(s) / 52K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a method to improve several memory DIMMs called-out together problem determination on enterprise sever. As Industry design, memory normally is organized as quad, or dual at least.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 49% of the total text.

Page 01 of 4

Method to Improve Memory DIMM Diagnose on Enterprise Server

Disclosed is a method to improve several memory DIMMs called-out together problem determination on enterprise sever. As industry design, memory normally is organized as quad, or dual at least.

    So the most chances are all the memory DIMMs in the quad will be deconfigured even if only one of them is the true defective one. In field, the quickest way to fix this memory DIMM issue is to replace all the called-out DIMMs. This is really a waste of a lot of money for server manufacturers. Service planning teams always complain why the need to consume a lot of memory DIMMs year to year. The customer is also asking: why are so many DIMMs defective at the same time?

    The situation could be worse; for example, field only has one DIMM in local stock and the case is on very emergency, but four DIMMs need to be replaced since one cannot solid point out which one is defective. Field engineering does have to swap the suspected DIMMs with good ones in current machines and try to boot up the machine three times. And what's more, if the good one is damaged or seated improperly during the swap, the situation will go out of control.

    With swapping the DIMMs physically many times, one can find the exact culprit. But as mentioned above, the waiting time and the possibility of damaging or improperly seating will be painful. Disclosed is a software solution for the DIMMs swapping and diagnosis, without physically moving the DIMMs between quads in the machine.

    Using this methodology, field engineers do not need to try physical movement anymore, which will improve problem determination efficiency and help customers to build good image of IBM* products, especially in memory failure rate.

    Regarding with main memory design, memory subsystem contains memory controller which sends address and data signals to the DRAM ("2") chip, and the rank of DRAM chip (DIMMs). Disclosed is a new methodology to improve DIMM diagnose that will add one arbitration switch mechanism in memory controller and the related software function.

    Disclosed is a software function, which can be included in the server management module firmware, like FSP or AMM. Field engineer can enable this function with specific memory controller through management GUI, called Memory Maintenance Mode. So when Memory Maintenance Mode is on, the firmware will perform a lot of testing on the specific DIMMs automatically during IPL. And the new arbitration switch mechanism will solve what the exact test is. Figure 1 shows the arbitration switch is resident in memory controller, which provides the testing sequence for DIMMs. There are three steps in the testing sequence:


1) First step, the arbitration switch will set DIMM 1 and 2 in the quad as true, meaning DIMM 1 and 2 are good ones and do not need to be examined. So if there is still reporting memory issue, and IPL failed due to memory problem in some cases, the defective DIMM is located in DIMM 3 an...