Browse Prior Art Database

Distributed Check-Sum Approach for Clustered Disk Drives

IP.com Disclosure Number: IPCOM000037045D
Original Publication Date: 1989-Nov-01
Included in the Prior Art Database: 2005-Jan-29
Document File: 4 page(s) / 149K

Publishing Venue

IBM

Related People

Donaldson, JE: AUTHOR [+7]

Abstract

Described is an approach for distributing check-sum data across clustered direct access storage devices (DASDs) to eliminate the burden of check-summing on one device. This minimizes the performance degradation associated with check-summing by averaging the check-sum maintenance task among all the devices. All devices are used for data operations and appear to have high performance actuators. (Image Omitted)

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 54% of the total text.

Page 1 of 4

Distributed Check-Sum Approach for Clustered Disk Drives

Described is an approach for distributing check-sum data across clustered direct access storage devices (DASDs) to eliminate the burden of check- summing on one device. This minimizes the performance degradation associated with check-summing by averaging the check-sum maintenance task among all the devices. All devices are used for data operations and appear to have high performance actuators.

(Image Omitted)

In a system configuration with grouped or clustered DASDs, as shown in Figs. 1 and 2, check-summing is implemented as a data integrity feature. The check-summing algorithm is handled by the system in Fig. 1, or by the interface logic in Fig. 2.

As shown in Table 1, with clustered DASDs numbered 1, 2, 3, 4,..., up to n parallel devices, a portion of the check-sum data appears on each device. The table is formed by breaking the data area of each device into n equally sized areas (as many data areas, or rows, in the table as there are devices, or columns) such that the variable M is the largest data block address (DBA) on the smallest capacity device (the (M+1)/n is a whole number). This assures that the area left for check- sum is at least as big as each of the n-1 data areas, each of whose size is (M+1)/n DBAs. The check-sum data are contained in the nth data area.

To generate check-sum data, assume the check-sum area on some DASD (column) j. Go to the data area in row 1, column j+1 and proceed diagonally from (row, column) (1, j+1) across data areas (2, j+2), (3, j+3) until reaching data area (n-1, j-1), doing a bit-by-bit Exclusive-Or (XOR) of the corresponding data bits of the n-1 data areas. This results in the check-sum data being on a device separate from any of the data that makes up the check-sum, which is of course a requirement of the algorithm. Note that the diagonal wraps from column n to column 1. Also note that any row or column term resulting in zero is outside the table and so is discarded.

(Image Omitted)

Tables 2 and 3 are provided to show the distributed check-sum implementations for the cases when the cluster contains three devices and four devices, respectively. Table 1 is used to generate check-sum tables for other cluster populations. If the cluster consists of two devices, the DASDs are mirrored. In other words, both devices have identical copies of the data.

Check-summing provides for extra data recovery action if a sector is not recovered by the device's data recovery procedures or if one of the devices becomes inoperative by allowing the lost data to be recreated from the other n-1 devices. The recreated data are then returned to the user and rewritten on a newly reallocated data sector, or restored to the new (replaced) device. To illustrate the restore process, assume that DASD 3 is replaced in a four DASD cluster which has check-summing implemented. Refer to Table 3. Data area (1,

1

Page 2 of 4

3) of the newly inst...