Browse Prior Art Database

Dynamically improving virtual appliance deployment through error recovery plan

IP.com Disclosure Number: IPCOM000238425D
Publication Date: 2014-Aug-26
Document File: 5 page(s) / 76K

Publishing Venue

The IP.com Prior Art Database

Abstract

Most of the cloud management solutions in this space however face a problem of repeated failures in deploying virtual appliances. This not only creates delays in development and testing, but also creates an overhead for administrators to repeatedly fix similar errors and make the cloud functional again.

We propose a generic solution for cloud based systems management solutions that relieve the deployment of virtual servers from the errors that were frequently seen in the past and ensure that they do not occur again.

Our teaching proposes to analyze these errors, categorize them based on root cause and notify administrator to define an alternate solution or error recovery plan. Our solution learns this dynamically at run time. It adapts to the deployment errors, learns intelligently from the past errors and executes the dynamic error recovery plans for the subsequent deployments. This will improve the subsequent virtual appliance deployment and make the cloud function better and learn dynamically on its own with full administrator control.

We propose a dynamic self learning console that can facilitate administrator to see the errors categorized by root cause and frequency and let them define the action to be executed when the error occurs next time. It could be a simple command, or script or a program to be executed when the error occurs again. Or it could be an action like escalate to the management, send an sms to administrator, restart cloud or reattempt the deployment after certain period of time. We do not claim the nature of the error recovery or action plan.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 32% of the total text.

Page 01 of 5

Dynamically improving virtual appliance deployment through error recovery plan

Infrastructure as a service is a very essential form of cloud management for most of the organizations in the world. This facilitates development and testing of the applications rapidly thereby improving productivity and reducing costs of infrastructure and administration.

Most of the cloud management solutions in this space however face a problem of repeated failures in deploying virtual appliances. This not only creates delays in development and testing, but also creates an overhead for administrators to repeatedly fix similar errors and make the cloud functional again.

The administrators typically see errors in deployments that are similar in patterns e.g. running out of available IP address pool, not able to provision storage volume, not able to execute some nim command or not able to provision the VLAN on the virtual switch for some reason etc. They typically know the solution to fix this and have to execute it manually when an instance creation fails.

We have seen this happening several times with solutions like IBM Systems Director VM Control or IBM Smart Cloud Entry. The impact of this kind of errors is delays in the execution of the end user's work (Developer, tester, deployer) and also losing valuable time of the administrators which could be used in more pressing matters that are value add to business.

The existing cloud solutions have error recovery plans in terms of certain behaviors. They generally try to attempt the deployment operation again but the retry may not succeed unless the error is actually fixed. Some solutions may execute the predefined alternate path that is embedded in the algorithm of the solution's program itself. None of these solutions are dynamic enough and do not give administrator a control to customize the error recovery or define a plan. The error recovery scenarios or fixes can be and are mostly specific to each customer environment and cannot be hard coded.

Also the existing cloud solutions do not perform error analysis from the historic data and categorize and plot the most common errors. This hinders administrators to define any error recovery plans or to take actions.

There is a need for a solution that can learn dynamically and make the cloud function better by learning from previous errors. At the same time administrator should still have full control over how cloud is trying to recover from the error but need to have to keep watch on it.

Prior Art

1. CN10262922: Method and device of integrated data disaster recovery based on cloud Platform: - This is for configuring data disaster recovery strategies for various applications deployed on the cloud platform and monitoring running status of the various applications on the cloud platform. This invention is not related to the recovery strategies in the virtual machine provisioning tasks that we proposed in our disclosure.

1


Page 02 of 5

We propose a...