Browse Prior Art Database

Enhanced RAID-5 Error Recovery in Response to Drive Hard Read Errors

IP.com Disclosure Number: IPCOM000123202D
Original Publication Date: 1998-Jul-01
Included in the Prior Art Database: 2005-Apr-04
Document File: 4 page(s) / 172K

Publishing Venue

IBM

Related People

Pacheco, JF: AUTHOR [+3]

Abstract

Problem Hard disk drives have extensive error recovery capability, but occasionally, hard unrecoverable Read errors (due to reasons explained below) can happen. In a RAID subsystem, when the hard error occurs, the adapter has the ability to reconstruct the data that is lost. This data can then be written back to the drive at a new, error free site.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 34% of the total text.

Enhanced RAID-5 Error Recovery in Response to Drive Hard Read Errors

   Problem

   Hard disk drives have extensive error recovery capability,
but occasionally, hard unrecoverable Read errors (due to reasons
explained below) can happen.  In a RAID subsystem, when the hard
error occurs, the adapter has the ability to reconstruct the data
that is lost.  This data can then be written back to the drive at a
new, error free site.

   Without this invention, the RAID adapter would perform the
following steps after the drive reports the hard error:
  o  pauses handling requests to the portion of the array that
      had the error, ultimately suspending responses to the host
      system/operating system/applications
  o  reads data from several other drives, using it to
      reconstruct the lost data
  o  issues a command to reassign the defective sector to a new
      location
  o  issues a write command to place the reconstructed data in
      the new location
  o  resumes handling any suspended requests to the array

   Accomplishing these various steps requires the adapter to
expend considerable time, seconds to tens of seconds.  Also while
this operation is going on specifically while the reassignment is in
progress, if the adapter does not complete this operation
successfully there is a risk that the data may be lost.  This can
happen in the case where power to the system containing the adapter
that is performing the reassignment is removed, the drive will
complete the reassignment and the data of the sector in question will
contain some default pattern, not the actual customer data.  The
correct data is sent to the drive as explained above, with a Write
command containing the reconstructed data.

   By utilizing some enhanced capability on the drive, some
of the steps are off-loaded to the drive.  Thus, this invention
allows a RAID subsystem to encapsulate data integrity information
(checkpoints) in the same device which holds the data itself.  This
greatly enhances the robustness of the entire subsystem, especially
in applications which utilize redundant array controllers (such as
clusters).

   Solution

   Some SCSI hard disk drives support auto reassignment of
hard errors that occur on read commands.  To make this invention
work this function is made mandatory and the function will be
enabled.

   Hard errors are those that cannot be recovered by the
built in error recovery capability of the drive.  These methods
include hardware on-the-fly error correction (ECC), longitudinal
redundancy checking (LRC), and more complicated methods supported by
drive microcode, such as offtrack reading, increased gain while
reading, etc.  Hard errors are thus unrecoverable errors, caused by
defects in the media or by contamination.  Defects occur when the
head or slider strikes the media at high velocity, either in a
dynamic mode (disks spinning) or static (disks not spinning).  The
defect effective...