Browse Prior Art Database

USING CANARY DEPLOYMENTS IN AN OUTAGE SENSITIVE ENVIRONMENT

IP.com Disclosure Number: IPCOM000249091D
Publication Date: 2017-Feb-03
Document File: 7 page(s) / 127K

Publishing Venue

The IP.com Prior Art Database

Related People

Brian Powell: AUTHOR

Abstract

Techniques are presented to ensure clients receive production-level quality and calls to a web service succeed even when unproven canary versions are deployed into the environment. This allows new (and potentially buggy) versions to be deployed side-by-side with current production quality services without negatively affecting user experience.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 45% of the total text.

Copyright 2017 Cisco Systems, Inc. 1

USING CANARY DEPLOYMENTS IN AN OUTAGE SENSITIVE ENVIRONMENT

AUTHORS: Brian Powell

CISCO SYSTEMS, INC.

ABSTRACT

Techniques are presented to ensure clients receive production-level quality and

calls to a web service succeed even when unproven canary versions are deployed into the

environment. This allows new (and potentially buggy) versions to be deployed side-by-

side with current production quality services without negatively affecting user

experience.

DETAILED DESCRIPTION

There are many ways to deploy new versions of microservices. Two examples

are so called “blue/green” deployments and “canary” deployments. Typically, blue/green

deployments involve two full stacks that operate in parallel, with the host device (such as

a network router) sending traffic either to one stack or the other [see

http://martinfowler.com/bliki/BlueGreenDeployment.html]. A canary deployment

involves intermixing a new version with the current version and varying the relative

amount of traffic sent to each version. Canary deployments can be percentage-based, or

determined by a flag to complete an incremental rollout [see

http://martinfowler.com/bliki/CanaryRelease.html].

In canary deployments, the new version may contain mistakes (i.e., bugs or

defects). As such, if a user (e.g., sensitive end user’s) traffic is selected to be transmitted

to the new version, the new version may repeatedly fail. Even in percentage-based

routing, there is a reasonable chance that this scenario could occur. This causes poor user

experiences in which the new version fails to provide the desired service.

It is desirable for the user to signal that it does not want to use any canary

versions of services in its call path. To facilitate this signal, a flag (e.g.,

“FORBID_CANARY : 1,” “Canary : Avoid,” etc.) is added to the retry logic. Subsequent

attempts at a failed application programing interface / web service call causes the signal

Copyright 2017 Cisco Systems, Inc. 2

to be routed to production versions. Metrics for occurrences are submitted to alert the

service provider that users are experiencing problems with the canary version.

Both the initial adding of the flag and its removal could follow a circuit breaker

pattern [see http://martinfowler.com/bliki/CircuitBreaker.html], such that once tripped

there is a second threshold that will be used before a request is attempted without the

FORBID_CANARY flag.

In an example, the flag is enabled for a period of time (e.g., until the call

succeeds, N number of subsequent calls, X minutes after the initial failure, etc.). The flag

may be located anywhere that the device (load balancer / router) is enabled to read. This

includes headers and data if the signal is not encrypted or if the device is the secure

sockets layer / transport layer security termination point. A contract to define the flag and

behavior needs to be in place between the client(s) and the load balancer / router.

Figure 1 below illustrates an...