Browse Prior Art Database

Method to elect resilient island in a Cluster split to ensure data integrity and high availablity

IP.com Disclosure Number: IPCOM000249563D
Publication Date: 2017-Mar-03
Document File: 8 page(s) / 94K

Publishing Venue

The IP.com Prior Art Database

Abstract

In a High availability cluster, there are chances that nodes might lose network connectivity and/or storage connectivity. This leads to split brain situation where 2 or more islands/sub clusters are formed. Each island thinks other islands are down and bring up application and acquire resources needed for applications such as filesystem/disks, IP label etc. This could lead to data corruption when more than one island accesses disk to write simultaneously.

To avoid such conditions, there are several existing split-merge policies like Tiebreaker using disk or Network File System, Majority, Priority etc.

The idea is to propose a way to handle cluster in split scenarios which doesn’t depend on any assumptions and independent of Split and Merge polices. When split occurs irrespective of number of islands an algorithm would verify on all the nodes of the cluster, where every node will decide itself whether it is capable of hosting application by acquiring all necessary resources. The node will decide based on several checks listed as below

1. Network connectivity check to make sure this node is reachable from outside. 2. Storage connectivity check to verify disks used by applications are accessible.

A node is eligible to host application only if all the required resources are available. After all nodes are done with this check, the nodes which are eligible to host application must be in sync and should decide themselves where to host the application. The nodes which are not eligible to bring up application must either be rebooted or cluster services must be disabled.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 28% of the total text.

1

Method to elect resilient island in a Cluster split to ensure data integrity and high availablity

In a High availability cluster, there are chances that nodes might lose network connectivity and/or storage connectivity. This leads to split brain situation where 2 or more islands/sub clusters are formed. Each island thinks other islands are down and bring up application and acquire resources needed for applications such as filesystem/disks, IP label etc. This could lead to data corruption when more than one island accesses disk to write simultaneously.

To avoid such conditions, there are several existing split-merge policies like Tiebreaker using disk or Network File System, Majority, Priority etc. However, there are limitations to these existing policies.

Listed below are the drawbacks of some of the popular Split merge policies. During split or merge situations

1. If Disk Tiebreaker is used as Split or Merge policy, a. The disk used as tiebreaker should be shared across all the sites. This would

not be possible if the sites are geographically far and there is no common storage.

b. If access to tiebreaker disk is also lost on all nodes, none of the nodes would take over applications thinking the other island would bring up.

c. If the island winning tiebreaker is the island lost network connectivity to outside world, there is no use in bringing up the applications as outside client won’t able to reach PowerHA. Hence it is not advisable to bring up applications on this island.

2. If NFS Tiebreaker is used as Split or Merge policy, a. If network connectivity is lost on NFS server, then none of the nodes can

reach NFS server to acquire the NFS tiebreaker, which leads to all islands to be on losing side

b. If storage connectivity is lost on the island that wins NFS Tiebreaker, it is not advisable to host applications on that island assuming any application needs both network and storage connectivity to be healthy.

3. If Priority or Majority is used as split or merge policy, a. If network or storage connectivity is lost on the island that continues to host

application based on Priority and Majority policies, it is not advisable to host applications on that island assuming any application needs both network and storage connectivity to be healthy.

The idea is to propose a way to handle cluster in split scenarios which doesn’t depend on any assumptions and independent of Split and Merge polices. When split occurs irrespective of number of islands an algorithm would verify on all the nodes of the cluster, where every node will decide itself whether it is capable of hosting application by acquiring all necessary resources. The node will decide based on several checks listed as below

2

1. Network connectivity check to make sure this node is reachable from outside. 2. Storage connectivity check to verify disks used by applications are accessible.

A node is eligible to host application only if all the required resources are available. After all nodes are...