Browse Prior Art Database

METHOD FOR FLEXIBLE AND DIVERSE ERROR HANDLING FOR STORAGE DEVICES

IP.com Disclosure Number: IPCOM000233966D
Publication Date: 2014-Jan-06
Document File: 6 page(s) / 271K

Publishing Venue

The IP.com Prior Art Database

Abstract

The disclosure discloses a method that the storage device can provide the storage subsystem with its suggested error handling corresponding to the error type. An error code handling table defines, for each error type, a corresponding action which is suggested by the storage device for the storage subsystem to take. This table resides in the storage device. The storage subsystem obtains the table during the initialization of the storage device, or at any time when required through standard interface communication. If an error is reported by the storage device to the storage subsystem, the suggested action per this error type is determined from the table, and then can be applied by the storage subsystem. In another invention aspect, the content of the table can be altered, either by the storage device itself through a code download / upgrade, or between the storage subsystem and the device via predefined and authenticated (if needed) interface, thereby supporting a more flexible strategy of the error code handling.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 36% of the total text.

Page 01 of 6

METHOD FOR FLEXIBLE AND DIVERSE ERROR HANDLING FOR STORAGE DEVICES


1. Background

A storage device (which is preferably a disk drive) can generate error codes upon detecting certain error conditions, and report these error codes to a storage subsystem to which it is attached. As an embodiment, a SCSI (Small Computer System Interface) target device can return an error-code known as Key Code Qualifier (KCQ) to a SCSI initiator device. "When a SCSI target device returns a check condition in response to a command, the initiator usually then issues a SCSI Request Sense command. The target will respond to the Request Sense command with a set of SCSI sense data which includes three fields giving increasing levels of detail about the error.[1]"

The initiator can take action based on the error code. FIG. 1 is a diagram of an existing mechanism embodiment - how a storage subsystem takes action based on the error code returned by a storage device.

FIGURE 1

1



Page 02 of 6

Disadvantage / Limitation:

1. Not flexible to change. Changes to the error handling methods are inevitable. For an instance, the increasing demand of less response time to satisfy the time-critical client applications could make the designer of storage subsystem choose more aggressive error handling method. An exemplary case is the storage system could change from whatever existing method to reject on a single occurrence of the error. Thus the culprit device gets ejected without wasting time on the retry or recovery, meanwhile data can be retrieved or reconstructed from redundancy devices to guarantee the overall quick response time.

As the embodiment shows, currently the error code handling logic is typically coded in the storage subsystem's firmware/software. Any change will bring a re-building of the firmware /software image and downloading/upgrading to the storage subsystem.

2. Not easy to support differentiation. "In practice there are many KCQ values which are common between different SCSI device types and different SCSI device vendors.[1]" However, there are also vendor specific and proprietary KCQs which the storage subsystem should be able to recognize and handle. Furthermore, sometimes even for a same KCQ value, different SCSI device vendors may have different preference for the action to take.

In the existing mechanism, this will add additional logic for the storage subsystem to inquiry and identify the type of the device, and choose different code paths for different types of handling.

3. Error handling method chosen by the storage subsystem sometimes may not match what the storage device truly requires, especially if the objective is to recover fromthis type of error. "This increasing sophistication of disk drives and other peripheral devices has often included on-board diagnostic and recovery capability.[2]" And "peripheral devices have grown in sophistication to perform many of the functions related to their operation and maintenance, with minimal support requ...