Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Parallelizing Dump and Operating System Boot in a Virtualized Environment

IP.com Disclosure Number: IPCOM000223159D
Publication Date: 2012-Nov-05
Document File: 10 page(s) / 61K

Publishing Venue

The IP.com Prior Art Database

Abstract

A technique to achieve early parallelism of dump and operating system restart is presented on the basis of infrastructure of IBM System P platforms. Existing designs, such as Firmare Assisted Dump for AIX implement transfer of the dump content to a dump device after performing the first phases of system reboot. This has the advantage that a larger set of operating system functions is available to perform the dump. However, on large systems the boot process itself, i.e. firmware communication, initializing devices, etc. might take significant time during which dump I/O does not occur. The presented design for concurrent dump and operating system reboot relies on functions provided by a hypervisor and virtualization infrastructure to implement concurrent maintenance of two operating system instances on a set of resources, CPU, memory and I/O devices, for the duration of the dump. A partition that virtualizes I/O, the Virtual I/O Server, provides for creation and maintenance of clones for devices that are required to perfom the dump yet also are required to exist in the newly started operating system at boot time. CPU and memory are transferred from the dumping operating system to the newly started one as they are freed by the dump process.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 14% of the total text.

Page 01 of 10

Parallelizing Dump and Operating System Boot in a Virtualized Environment

Dumps are required for error analysis; they can take considerable time to complete though, during which the system is not available, extending outages and negatively affecting Service Level Agreements, SLAs. Many users do not enable dump collection because they simply cannot afford the downtime incurred by collecting the dump.

Several techniques are used to reduce dump time:

Parallelization of Dump Processing : Several tasks during dump, determination of memory regions to be dumped, compression, I/O, can be performed in parallel. Parallelization alone though does not yield a significant reduction of dump time as the overall dump process is I/O bound. Dump completion time depends on the I/O speed of the dump device, which ranges between 320Mb for an internal SCSI drive and 1GB for a Fibre Channel device. In a virtualized setting, I/O bandwidth is shared with other partitions.

Early Reboot: The content of the memory region that contains the dump content is preserved across reboot and then written to disk. Dump content might be moved to a designated memory region before reboot. This has the advantage that the system is available earlier, although with a reduced set of memory, which is gradually released back as it is being freed by writing dump content to disk. AIX* Firmware Assisted Dump implements this, [6], also [1],


[2], [3] describe techniques for early reboot.

Compression is an obvious technique that is used in the majority of system dumps.

Early Error Analysis: If the dump is completed after reboot, the dump might be analyzed and aborted if the symptom is known or the amount of data to be dumped reduced to the set that is assumed to be pertinent to the cause of crash, [1].

Minimizing the impact of a system dump on the uptime of a machine and availability of system resources, CPU, memory and I/O bandwidth, is an ongoing challenge and an important factor that greatly affects the perception od serviceability of a machine.


2. Architecture for Concurrent Dump and System Reboot

A design for concurrent dump and operating system restart is presented to achieve parallelism of dump and system reboot during early boot phases. The system boot process itself has little parallelism and uses minimal CPU and memory resources. Utilizing CPU, memory and I/O bandwidth for the dump processes while rebooting reduces the idle time of these resources and makes them available earlier for use by the rebooted system. While the system dump is ongoing, two separate operating environments exist on the set of resources, - the dump process on a subset of memory and CPU of the partition and the operating system, still booting or already operational on the remaining CPU and memory.

The design is described for AIX / System P** platforms an virtualization infrastructure. The following steps outline the actions taken during dump and reboot.


1. Provision of a minimal set of resources to s...