Browse Prior Art Database

Method for assisted volume manager recovery in a virtualized environment

IP.com Disclosure Number: IPCOM000232289D
Publication Date: 2013-Oct-30
Document File: 3 page(s) / 33K

Publishing Venue

The IP.com Prior Art Database

Abstract

A method for assisted volume manager recovery in a virtualized environment is described. It uses an agent to communicates requirements with a peer agent in the hypervisor or VIOS. Monitoring and probing is then handled by the hypervisor/VIOS peer, with a notification being sent back to the client peer when the disk is wholesome again. This eliminates all need for probing by the client; it can just wait for the hypervisor/VIOS peer agent to give the all clear and then can attempt re-integration.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 39% of the total text.

Page 01 of 3

Method for assisted volume manager recovery in a virtualized environment

Many computer operating systems have a volume manager component that provides a logical volume abstraction based on an underlying set of physical disk devices. A logical volume manager will typically associate a group of physical disks together into a "volume group" and will allow creation of "logical volumes" which can be spread across the volume group however the creator pleases. The logical volumes provide an abstraction for the higher level components in the system (such as file systems and applications), and frees them from needing to understand the particulars of the underlying physical devices.

The volume manager typically tags each of the physical disks that participate in a volume group, and maintains some metadata about membership and logical volume layout. The metadata is typically written to some subset of the physical disks themselves.

The volume manager needs to track the state of the physical disks. If a disk becomes inaccessible or unreliable in some way, then the volume manager may fence it off and refuse further requests (e.g. read/write) to logical volume regions that map to the missing or sick disk.

Volume managers may require intervention by a System Administrator to return fenced-off disks to service. In the AIX* logical volume manager, for example, a disk may be declared "MISSING" when it becomes unavailable, and an Administrator would have to issue a command (e.g. varyonvg) to restore the MISSING disk to service.

The dependence on manual intervention for re-integration of a disk may delay recovery of an outage, and may lead to increase in the scope of a failure. (For example, consider a logical volume mirrored between disks A and B. Imagine that A becomes inaccessible for a time and then recovers, and a short while later B becomes inaccessible for a time and then recovers. If A was not re-integrated promptly then the entire logical volume may be lost when B fails, even though one mirror or the other is always physically accessible.)

The volume manager could probe MISSING or fenced-off disks on a periodic basis, but this has drawbacks. The mere act of probing may activate error recovery paths in the components below the volume manager that manage the physical disks. That in turn may have various negative consequences:


- Error notifications may be produced which may be alarming to users or administrators


- If error notifications are excessive then newer notifications may displace or overwrite older notifications, including possibly useful information about the initial failure


- Recovery actions may escalate beyond the failed device to affect other devices (For example, in a SCSI environment the host may issue a LUN Reset to reset the single, misbehaving disk. If that fails, then the host may naturally escalate to a Target Reset which may disrupt activity on other disks not affected by the original failure.)


- Recovery actions may be tim...