Browse Prior Art Database

A Method and System for Achieving High Availability and Disaster Recovery within Stretched Hybrid Clouds

IP.com Disclosure Number: IPCOM000239022D
Publication Date: 2014-Oct-01
Document File: 5 page(s) / 328K

Publishing Venue

The IP.com Prior Art Database

Abstract

A method and system is disclosed for achieving high availability and disaster recovery within stretched hybrid clouds.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 51% of the total text.

Page 01 of 5

A Method and System for Achieving High Availability and Disaster Recovery within Stretched Hybrid Clouds

Disclosed is a method and system for achieving high availability and disaster recovery within stretched hybrid clouds.

The method and system provides a fine grained control over replica placement to achieve high availability and disaster recovery with hybrid stretch clusters. The method and system utilizes Write Affinity Failure Group (WAFG) to control data placement in local clusters and remote clusters. In addition, the method and system tolerates site as well as node failures in the local clusters and the remote clusters with a locally attached storage.

The method includes a step of achieving synchronous High Availability (HA) and Disaster Recovery (DR) by using the local storage by intelligent data placement algorithms developed for workload management. Thereafter, the method and system recovers data after a DR event to still allow HA and thus node failures in the remote site without any storage level outage. After recovering the data, the method and system restores to an initial layout after the recovery of the primary site using WAFG.

In a scenario, the method and system enables placement of only one replica in a failure group, wherein the failure group

is defined as a unit of failure where a disk or a node failure disables the whole group.

In another scenario, the method and system enables current replica placement for disaster recovery configuration that defines local cluster as one failure group and remote cluster as a different failure group. Here, a local failure of a node causes the current replica to be fetched from remote cluster thus incurring Wide Area Network (WAN) penalty for read.

In another scenario, the method and system enables three replicas to support two local failure groups and one remote failure group. A replica placement algorithm is utilized to ensure that there is one copy per fai...