Disclosed is a method to utilise chaos (controlled failure) testing experience to improve time of recovery from environmental issues

There are tools in the market to induce the failures and uncover the issues which are either related to the specific service or to the specific environment. Both type of failures will cause down time on production environment. Environmental issues are the issues that are not related to the service code - but the issues of the production environment itself. This would not have got uncovered unless a particular sequence or sequences of failures occur. However there are no current ways to use this knowledge gathered during chaos testing to improve time of recovery from probable environmental issues

As part of this invention we propose to

1) Identify set of actions required to recover from any environmental issues uncovered during controlled failure tests and abstract this set as template/templates

2) Customize and apply this templates as remediation fixes in the target production environment and automatically invoke this fix when need arises.

Steps to recover the service from failures using templates

Issues/failures can be recovered in multiple ways and it depends on the environment where the target service is running. The set of recovery actions from environmental issues may change. So each target service teams can customize or even define their own templates using the proposed flow chart (Proposed process to capture template)

The "Proposed process to capture template" should be executed by target service development teams to define their own templates. Chaos service - can offer pre defined templates as standards following same "Proposed process to capture template"- but specific to pre defined set of target environments

Following are the steps proposed as part of this invention to define templates and to retrieve a target service with pre-defined template. There are failures[this can be either environment related failure or related to service/application] which are repeated and which cannot be handled in code

Induce failures with chaos service


Identify the set of failures for the target environment


Identify the set of actions to recover the environment from failures in the form of



Customize the template as generic remediation fix for the failures to work in the


target environment

Enable the templates on real production environment



Failures will be recovered with pre-defined templates Define process to create templates for the new...