Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Failure Localization in a Distributed/Microservice Environment

IP.com Disclosure Number: IPCOM000248317D
Publication Date: 2016-Nov-15
Document File: 4 page(s) / 51K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a method to identify the source of a failure in a distributed/microservice environment by using returned data objects to maintain a record of a call chain and call results. The method is for each Application Programming Interface (API) to build a data structure that is returned to the preceding layer, with each layer having the ability to log the structure for redundancy.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 51% of the total text.

Page 01 of 4

Failure Localization in a Distributed/Microservice Environment

When Application Programming Interface (API) calls fail in a distributed/microservice environment, it can be very difficult to determine where the actual failure occurred. The problem is that an API call may actually be several nested calls, with APIs on one microservice calling APIs on other microservices. The result is a series of interconnected calls, and a failure at any step causes the entire call chain to fail. When such a failure happens, it becomes necessary to determine which microservice caused the failure in order to create a fix. However, in a large system with many nested calls (especially where different teams are responsible for different microservices) it can be difficult to even know which microservices are called, much less which microservice failed.

Figure 1: In a series of interconnected calls, a failure at any step causes the entire call chain to fail

The novel solution involves the use of returned data objects to maintain a record of a call chain and call results. The method is for each API to build a data structure that is returned to the preceding layer, with each layer having the ability to log the structure for redundancy. The structure built by each layer includes the following elements:

1


Page 02 of 4


 Request ID (this is passed through the call stack so each microservice uses the same request ID)


 API called


 Success/failure information


 Time needed for the API to run,


 Other APIs called by the original API


 Data structures returned by other APIs

This allows the entire call stack to be determined for each call, which helps in locating the failing APIs at a glance. It can also be made lightweight enough (using error codes and the like) that it can be run for every call instead of being limited to just a sampling. In the event that the error messages do not contain enough information to completely debug the problem, the call trace can show which microservice failed. That allows the debugger to go directly to the logs for the failed microservice instead of having to dig through debug data for all microservices trying to find a clue as to what went wrong.

Each microservice in the environment connects to a framework that generates the callback data structure for API calls. This framework links to the microservices APIs as well as linked into API calls to other microservices in the environment. This framework builds a returned data structure from data generated by the framework and from information passed into t...