Browse Prior Art Database

Two-Level DASD Failure Recovery Method

IP.com Disclosure Number: IPCOM000104105D
Original Publication Date: 1993-Mar-01
Included in the Prior Art Database: 2005-Mar-18
Document File: 4 page(s) / 104K

Publishing Venue

IBM

Related People

Ouchi, NK: AUTHOR

Abstract

Double DASD failure recovery methods require N+2 units for N units of data. That is, two additional units per group of N data units. Disclosed is a mechanism for a DASD array with M groups of N drives, where only M+1 additional units are required as compared to 2M units. The redundant information is structured in a two level mechanism. Recovery from one failure is faster than recovery from two unit failures.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Two-Level DASD Failure Recovery Method

      Double DASD failure recovery methods require N+2 units for N
units of data.  That is, two additional units per group of N data
units.  Disclosed is a mechanism for a DASD array with M groups of N
drives, where only M+1 additional units are required as compared to
2M units.  The redundant information is structured in a two level
mechanism.  Recovery from one failure is faster than recovery from
two unit failures.

      In a DASD array, the mean time to failure may be greatly
increased if failing units are reconstructed rapidly.  However, this
may be several hours because of the large capacity of the units (now
Gigabytes) and the performance impact on the array.  The ability to
correct two failures in the array to cover this period makes the
array effectively failure proof as long as power and control elements
are also designed to be fault tolerant.  This disclosure shows a
method for recovery from failure of two units with low capacity
overhead and rapid single unit recovery.

      An RS or B adjacent block code may be used to correct one or
two unit failures in a group of units.  This requires two units of
information derived from the data in the group.  Thus, for N units,
N+2 units are required.  The two redundant units are called P & Q,
where P is the Exclusive Or, XOR, of the data and Q is the XOR of
each data element multiplied by a binary matrix, Lamda, taken to an
integer power.  This operation is a matrix of XOR gates.  P & Q and
the mechanism to create and reconstruct data are described in the
cited disclosure.

      This disclosure shows that, for an array, the units may be
separated into multiple domains each with it own P for single unit
recovery and a single Q for recovery of two failures in one domain.
This is illustrated in Fig. 1 where P1 covers units A, B, & C and P2
covers D, E, & F. Q covers A, B, C, D, E, & F for double failures.
For a single failure, P1 or P2 is sufficient to recover the data.  In
fact, single failures in domains are independent and multiple single
failures in multiple domains can be recovered just using the P
informat...