Browse Prior Art Database

Cloud Image Deduplication: an efficient massive deploy system

IP.com Disclosure Number: IPCOM000240178D
Publication Date: 2015-Jan-09
Document File: 7 page(s) / 295K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed are a system and method to facilitate efficient use of resources for cloud image deployment. This new deployment method takes advantage of the Copy-On-Write (COW) feature to quickly deploy multiple virtual machine instances and save system resources.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 47% of the total text.

Page 01 of 7

Cloud Image Deduplication: an efficient massive deploy system

Cloud players do not consider system resource usage when spawning new instances (i.e. virtual machines (VM)). For disks, this is not a concern when using copy-on-write(QCOW) images on a few compute nodes (i.e. physical machines), since the associated size increases only when data is written. A new instance deploy also requires additional network bandwidth for transferring the image to the target compute host. Memory and processor cycles are consumed by the controller, compute hosts and image service.

Figure 1: Cloud deployment overview

However, on larger machines with many cores, the system resources in the cloud are wasted on massive deployments, through, for example:

Duplication of images Occurrences of identical blocks of data in images across instances
Network traffic for image transferring

1


Page 02 of 7


Processor cycles and memory required to clone the image for each instance

A more efficient way for cloud deployment is needed to address three main problems .

The first problem is the parallel deployment of Virtual Machines. Massive deployment of virtual machines does not have good performance because the existing solutions spawn instances serially or even in parallel, but not in an efficient manner. Massive deployment consumes system resources (e.g., network, memory, Central Processing Unit (CPU), storage, etc.). A system is needed to handle these resources in order to achieve better performance and utilization efficiency .

The second problem is cloud image cloning from image service. Whenever an image is deployed, the compute node has to clone the image to its disk and spawn an instance of that cloned image. This process is slow, since it requires the image to be entirely copied from the image service. It also consumes system resources (e.g., network, processor, and disk). This traditional method used in the existing cloud players does not present an optimal scaling schema.

The third problem is cloud image duplication across the compute hosts. For each virtual machine, a new image is entirely copied. That implies in multiple images containing the same data. In addition, similar workloads such as webserver and database have many identical blocks of images.

The novel contribution is a system and method to facilitate efficient use of resources for cloud image deployment . This new deployment method takes advantage of the Copy-On-Write (COW) feature to quickly deploy multiple virtual machine instances and save system resources. The principle of COW is to use a base image containing common (shared) blocks and apply layers on top of the backend image (base) to build custom images for each instance.

The main novelty is the capability of merging common blocks from the layers into the deduplication (dedup) snapshot layer. The core elements of the system and method are:


Deployment Affinity, which groups instances with the same image on the compute host...