Browse Prior Art Database

Method to recreate chaos scenarios for which service was not resilient

IP.com Disclosure Number: IPCOM000246365D
Publication Date: 2016-Jun-02
Document File: 3 page(s) / 152K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a a method to recreate resilience and recoverability issues that were observed during one phase of chaos testing of service This method will enable developers to debug the issues in more controlled environment.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 01 of 3

Method to recreate chaos scenarios for which service was not resilient

Micro services architecture, by nature operates on a distributed network and hence significant performance and being resilient is very vital. Any failures tend to risk penalties. To validate the infrastructure and its availability, Chaos Monkey introduces systematic chaos into the the application operations. It deliberately destroys resources in production environments at a time of day when most engineers can fix any errors that occur. By inducing failures, it helps to test the resilience and recoverability of the Micro services.

Chaos Monkey applies randomization and disruption to the services in the infrastructure to ensure that developers apply automation to limit the impact users experience when big problems do occur. However, one primary problem with current chaos-ing technique is that it does not provide a mechanism by which same problems(resiliency and recoverability problems) can be recreated in the developer chosen environment so that the problems can be debugged. Though the cloud community of developers and service providers understand the significance of the chaos-ing technique on the live production environment, the inability to reproduce the scenario with exact steps makes it difficult to debug and identify the root cause. This is essential to fix the problem and also to prevent any repeated failures of that sort.

The problem identified in this context: Even though chaos services are well known art - inability of these services to recreate(or to enable developer to re create) the issues that were uncovered in another selected environment for debugging - is seriously affecting the adoption rate for such services

This article talks about recreating resilience and recoverability issues that were observed during one phase of chaos testi...