Detection of hard disk drive media defects and repair of computer filesystems using only operating system level commands.

The following method is used to detect hard disk drive (HDD) media defects and repair computer filesystems with those defects. The detection and repair method requires no special hardware or software. Only operating system commands are used for the HDD defect detection and filesystem repair.

Summary of the problem

A computer hard disk drive (HDD) contains a disk surface, the media, which stores information. The surface of the media is divided up into sectors by the disk or operating system when the disk is formatted. The HDD sectors are assigned to computer filesystems when a filesystem is created, expanded, or repaired. Information is written and read to individual sectors. Over time, small areas of the HDD media fail and can no longer be written to or read from. The HDD sector with the bad media may or may not be part of a filesystem, and, may or may not contain data if it is part of a filesystem. The HDD media defects can go undetected until the disk attempts a read or write operation to the sector with the failed media.

A computer filesystem can be protected from HDD media defects by making and maintaining a mirror copy of all HDD sectors assigned to a filesystem. The mirror copy does not need to be on a different HDD, but typically a mirror copy is made on a separate HDD to also protect against a catastrophic HDD failure that prevents access to any sector on the disk. If a bad HDD sector is encountered during a read or write of a mirrored filesystem the corresponding mirror copy is used for the read or write. The bad sector cannot be repaired, but it can be substituted for another unused sector by the HDD microcode or operating system. The bad sector is removed from the filesystem, replaced by an unused sector, and data from the corresponding mirror copy sector written to the new sector.

However, an HDD media defect may go undetected for a long period of time if the sector is not read or written. Once a HDD sector fails the filesystem is no longer protected by mirroring, but the exposure is unknown to the operating system. During this period of exposure, the HDD containing the remaining good copy of the sector may fail catastrophically, leaving no good copy of at least one sector of a filesystem. This may mean data loss if the bad sector contains filesystem data. The scenario is described in detail below.

A pair of hard disk drives, disk (a) and disk (b), are used to create a mirrored filesystem. Each sector on disk (a) is mirrored to disk (b). A sector of data on disk (a) belonging to the filesystem goes bad (unreadable / unwritable) - a media failure. The sector is not read or written for a period of time. The failure goes undetected by the disk or operating system - this is called a latent defect. The other hard disk, disk (b), fails catastrophically. The operating system detects disk (b) failure. Disk (b) is physically replaced. The operating system at...