Browse Prior Art Database

Method and System for Isolating a Problem in a Storage Loop Disclosure Number: IPCOM000234871D
Publication Date: 2014-Feb-11
Document File: 5 page(s) / 183K

Publishing Venue

The Prior Art Database


A method and system is disclosed for isolating a problem in a storage loop. The method and system isolates a problem in the hardware of the storage loop by sequentially disabling some functions of the hardware. It then enables a user to resolve the problem.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 48% of the total text.

Page 01 of 5

Method and System for Isolating a Problem in a Storage Loop

Disclosed is a method and system for isolating a problem in a storage loop. The method and system performs analysis of errors in the system error log to determine hardware that contains the problem. Here, the problem in the hardware is also identified based on the analysis. The method and system then sequentially disables some of the function of the hardware in a sequence in order to observe a result of the analysis. For instance, if a faulty component of the hardware is disabled, then the system will no longer log errors. The method and system is able to locate the errors precisely by disabling the hardware functions. In addition, some tools can also be used to suppress the errors.

The errors can be logged in multiple ways under several conditions. For instance, firmware on a device adapter card logs error if the firmware finds that a connected disk is not in a healthy state. An error type of the error is then determined based on a number

provided in the errorlog entry. Some errors indicate a hardware failure and are managed from an application layer.

The method and system utilizes a problem isolation module to call a handling routine as an Error Recovery Procedure (ERP) to recover the errors. The errors are of different types such as, but not limited to, a broken link error, a lost frame error and a port disabled error and each error type is recovered by separate ERP. For example, a disabled port ERP is utilized to find a closest storage enclosure to the device adapter that contains a disk with a disabled port. There can be multiple disk drives with disabled ports, wherein the one closest to the device adapter is considered to be the root cause of the corresponding error. Further, a broken link ERP is utilized for focusing on port errors on a fiber channel switch card for identifying root cause of a loop error. Similarly, a lost frame ERP gathers information from historical error logs for the lost frame error. Thereafter, the method and system locates specific data associated with an error resource stored in a local database, to determine locations of potential hardware failure. For instance, if the lost frame error occurs in any disk in the storage loop, then the potential hardware failure can be in any section of the entire storage loop.

The method and system calls a fail disk ERP when a disk drive is identified as failed because of exceeding a threshold or causing other problems. The objective of the fail disk ERP is to remove the disk drive from the storage loop to prevent it from causing further performance or access degradation. A next level support ERP is called to deliver a service event with a possible failed

hardware component in a Field Replaceable Unit (FRU) list. Here, an engineer is required to provide a next level support for the next support ERP.

Every disk in an enclosure, logs a loop error if an uplink fiber channel has a failure. The device is assume...