Enhanced Virtual Machine Availability during system crashes in cloud like environment.

In cloud, reliability, availability and serviceability are some of the highly focused areas. This disclosure presents a recovery mechanism to enhance VM availability even in case of crashes, without loosing any serviceability aspects.


Disclosed is a system to boot a crashed VM ( Virtual Machine ) immediately providing enhanced application availability, at the same time maintaining all serviceability aspects of VM crash.

In system dump, system memory is captured into a dump device. Contents of this device is then retrieved and saved in form of a file that can be processed by tools to find the root cause of system getting into panic state. Hence, System dump is one of the major feature to achieve desired serviceability of the system as one can see the memory contents as is when the system got into panic state.

Existing procedure for taking system dump restrict themselves to the assigned memory of that

partition. This implies that when ever the system dump has to be taken, first the failed scenario memory needs to be saved and then boot the system. As a result of this, there will be some time for which systems will be unavailable for use.

In Current cloud Scenario, demand of the system resources needs to be fulfilled more aggressively, with lot of Virtual Machines moving in cloud and etc. if, the system dump happens, the downtime could range from several minutes to hours as mentioned earlier, but something that is least expected in cloud like environment.

The idea is to provide an additional memory to the partition from a pool of free memory so that it can continue to boot immediately and once it is up we run a process to dump the problem state memory parallelly starting the applications as usual.

There are 2 aspects of the disclosed system

1. Application availability: Achieved by immediately booting up the crashed VM on a memory region which is other than the current one.

VM is not tied to memory, it just sees a virtualized memory which physically can be lying anywhere in RAM ( Random Access Memory ). Hypervisor keeps track of what physical pages are mapped to which VM, this mapping is done with help of HPT ( Hardware Page Tables concepts ). Hence, with the help of hypervisor, VM can be immediately booted just by changing the memory mapping for the same VM.

Whenever VM gets a cra...