Stretch Codes: A System and Method for Boosting Storage Efficiency in RAID Systems
Original Publication Date: 2004-Mar-17
Included in the Prior Art Database: 2004-Mar-17
A system and method for design of XOR based RAID erasure codes, with the goal of achieving high efficiency. The design principle adds additional space for redundancy symbols, using them to gain efficiency without gaining additional distance.
Stretch Codes: A System and Method for boosting efficiency of certain RAID erasure codes
Jim Hafner, Jeff Hartline, Tapas Kanungo
Disclosed is a system and method to boost the efficiency of certain RAID (Redundant Array of Independent Disks) erasure codes for use in disk array storage devices. The methodology applies to the design of XOR based codes. It is a code design principle that claims that adding additional space for redundancy symbols without increasing distance enables significantly more additional space for data (information symbols). This can be done in many cases without significant penalties in other code metrics such as short write IO cost or parity complexity.
Codes designed by this methodology are called "stretch" codes because the code word gets larger rapidly as we add additional parity symbols. These codes are all "horizontal" codes in that the parity is designed to be placed on disks (or strips) that do not contain user data. (Here we limit this notion to data and parity from the same code instance or co-dependent data and parity - this does not preclude striping multiple code instances on disks with rotation for performance considerations.) Another interpretation of "horizontal" is that the generator matrix of the code has the form G=(In | Pn× p), an identity matrix together with a parity computation component.
There are four parameters that we use as the basis for our code constructions: d is the distance, n denotes the number of data disks, p is the number of parity disks and r is the number of rows in the 2-dimensional code. It is known that p≥d-1 is a necessary condition for any distance d code. The parameter r is called the symbol size in coding theory, and the value r(n+p) is the word length. Other parameters of interest are PinD and DoutD, the parity in-degree (maximum number of terms in a parity equation) and data out-degree (maximum number of parity dependent on a data element). We also consider the "block" versions of these values (here, block refers to chunks of size r, that is, code symbols. In the generator matrix G, the PinD is the maximum number of ones in any column and DoutD is the maximum number of ones in any row (less one).
When r≥2, the generator matrix G has the same general form, but each non-zero entry is replaced by some r×r non-zero block matrix; in the identity portion, these blocks are r×r identity matrices. The r-block PinD and DoutD are as before, but counting non-zero blocks.
For a given r,p,d, the goal is to find the largest value for n that provide a proper code (correct distance), subject to either no restrictions or specified restrictions on other parameters. We use the expression N(r,p,d) to denote this upper bound on n given the other parameters. We will see that as p increases, this value increases much more rapidly. We call this "horizontal stretch" because we stretch the horizontal parity. Similarly, as r increases, this value goes up quickly as well. We call this "v...