Browse Prior Art Database

Method to avoid two active primary (split brain) when do automatic takeover in HADR

IP.com Disclosure Number: IPCOM000198628D
Publication Date: 2010-Aug-11

Publishing Venue

The IP.com Prior Art Database

Abstract

This idea is added an automatic failover system into HADR environment. The system includes three parts. An 'adapter' run on primary database server, an 'adapter' run on standby database server and a 'coordinator' run on another independent server. This system can do failover automatically. And it will not cause ‘Brain Spilt’ (two primary databases) in any case.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 31% of the total text.

Page 1 of 19

) when do automatic takeover in HADR

when do automatic takeover in HADR

In database high availability disaster recoverable (HADR) environment, there are several kilometers between Primarydata center and Standby data center usually. In normal case, standby database server can get logs from primary database server and replay them. Chart 1 shows the normal status.

Method to avoid two active primary

(

((split brain

split brain)

1

Page 2 of 19

Chart 1: HADR normal status

Primary Data Center

Standby Data Center

Server kilometers between two centers

Primary Server

Primary Database

Standby

tandbyServer

Se

Standby Database

Log Transfer

Client

2

Page 3 of 19

Client

When the disaster occurs and/or something is wrong, we must confirm which parts fault. If primary database is unavailable, we can do failover and let standby database become new primary database. If standby database is unavailable, we can let primary database running as single database server. If the network has problem, we can select a server as primary server. But before we do this, we must call the other side to confirm the faulty part. It needs several minutes and if telephone line is broken it needs more time to do so. Chart 2 shows this case.

3

Page 4 of 19

Chart 2: HADR disaster status

Primary Data Center

Standby Data Center

StandbyServer

Standby Database

Primary Server

Primary Database

Broken

Log Transfer

Telephone Call

DBA DBA

4

Client

Page 5 of 19

DBA DBA

Client

In the process of confirming the fault parts and standby server taking over, clients can not connect to either server and just wait. It usually needs more than half an hour to do whole failover. But why do we must wait for this inefficient and fallibility process? Because, if we let primary and standby check each other and the network broken, it will cause "Brain Spilt". "Brain Spilt" means standby server turns itself to new primary server and old primary server still make itself as primary server. There are two primary servers at the same time. And clients do not know which primary server is "real" primary server, if writing operations were performed, the inconsistency between two server will occur. It will make data lost, even business stop. Chart 3 shows "Brain Spilt" (two primary servers).

5

Page 6 of 19

Chart 3: HADR "Brain Spilt"status

Primary Data Center

Standby Data Center

Primary Server

Primary Database

Primary

aryServer

Server

Primary Database

Log Transfer

Broken

Client

6

Page 7 of 19

Client

So, it is in a dilemma when disaster occurs or something is wrong. If we use telephone, it needs a long time and is uncertainty. If we let primary and standby check each other, it maybe goes into "Brain Spilt" status. So, we want a high-speed, reliable method to deal with failover.

My idea is added an automatic failover system into HADR environment. The system includes three parts. An 'adapter' run on primary database server, an 'adapter' run on standby database server and a 'coordinator' run onanother independent...