Browse Prior Art Database

High performance dual parity scheme for surviving permanent ECC checks encountered during RAID rebuild

IP.com Disclosure Number: IPCOM000016287D
Original Publication Date: 2002-Oct-28
Included in the Prior Art Database: 2003-Jun-21
Document File: 2 page(s) / 44K

Publishing Venue

IBM

Abstract

RAID storage controllers have been used to recover data resulting from a single failed disk for many years. RAID levels utilized to support various performance characteristics are well known in the art, these include simple mirroring to complex striped RAID 5 designs. As disk drives continue to increase in capacity, the amount of data required to be read in order to successfully rebuild a RAID array continues to increase dramatically. Considering the large physical disks currently being offered up to 146GB each in size, a simple 15+P RAID 5 array requires that 2.2 TB (2.2*10^12 bytes) of data be successfully read in order to rebuild the RAID array. Current disk technology predicts an uncorrectable ECC check every 10^14 bytes read, as arrays increase in capacity it becomes more likely that a rebuild operation will not be able to complete successfully. SMART technology is becoming prevalent in the industry to help minimize unexpected physical disk drive failures. Driver failures, however, do occur. In addition, data scrubbing techniques are utilized to attempt to find and fix any uncorrectable ECC errors prior to requiring the data during a rebuild operation. The problems with these techniques are that SMART will help predict when a disk will fail, it doesn't help with the general problem with the occasional uncorretable ECC checks due to media errors. Data scrubbing helps find and fix these checks, however, without incurring a noticable performance penalty disk scrubbing can take many days prior to completely scrubbing every disk behind a controller. A technique to solve this problem is to store an additional sector of parity data with each cache line. If a 64K cache line size is used, 127 sectors of data are in the cache line followed by a single additional sector of parity that represents the XOR of all of the data in the line. If an uncorrectable ECC check is encountered for any sector in the cache line, the sector may be rebuilt by XOR'ing each of the other sectors in the cache line together along with the partiy sector for the cache line. In normal operation, the sector could also be rebuilt from the data and parity from the other drives in the RAID array. However, if the uncorrectable ECC check occurs during a RAID rebuild operation, reconstruction from the other drives would not be possible. Performance

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 2

  High performance dual parity scheme for surviving permanent ECC checks encountered during RAID rebuild

   RAID storage controllers have been used to recover data resulting from a single failed disk for many years. RAID levels utilized to support various performance characteristics are well known in the art, these include simple mirroring to complex striped RAID 5 designs. As disk drives continue to increase in capacity, the amount of data required to be read in order to successfully rebuild a RAID array continues to increase dramatically. Considering the large physical disks currently being offered up to 146GB each in size, a simple 15+P RAID 5 array requires that 2.2 TB (2.2*10^12 bytes) of data be successfully read in order to rebuild the RAID array. Current disk technology predicts an uncorrectable ECC check every 10^14 bytes read, as arrays increase in capacity it becomes more likely that a rebuild operation will not be able to complete successfully.

SMART technology is becoming prevalent in the industry to help minimize unexpected physical disk drive failures. Driver failures, however, do occur. In addition, data scrubbing techniques are utilized to attempt to find and fix any uncorrectable ECC errors prior to requiring the data during a rebuild operation. The problems with these techniques are that SMART will help predict when a disk will fail, it doesn't help with the general problem with the occasional uncorretable ECC checks due to media errors. Data scrubbing helps find and fix these checks, however, without incurring a noticable performance penalty disk scrubbing can take many days prior to completely scrubbing every disk behind a controller.

A technique to solve this problem is to store an additional sector of parity data with each cache line. If a 64K cache line size is used, 127 sectors of data are in the cache line followed by a single additional sector of parity that represents th...