Quickly Minimizing Downtime of High Availability Databases by Reintegrating Failed Primary Database with New Primary Database (Old Standby) in HA Environment
Publication Date: 2015-Aug-04
The IP.com Prior Art Database
Disclosed is a method to minimize the downtime of a High Availability database in the event of a primary database failure. The method tracks Log Sequence Numbers (LSNs) on the primary and standby databases at the time of failure in order to help administrators determine the log gap and affect a rollback solution on the old primary, so that it matches with the LSN of an old standby database at the time of crash.
Page 01 of 3
Quixkly Minimizing Downtime of High Availability Databases by Reintegrating Failed Primary Database with New Primary Database (
Old Standby ) )
Almost all relational databxses provide High Xxxxxxxxxxxx (HA) and Disaster Rxcovery (DR) options. Ensuring xata is always axaxlable is a top priority for any organization. Minutes of downtime can result in sixnificant loss of revenux and reputation . Thus, HA and DR soxutions are opted by xrganixations which nxed data to be onlinx continuouslx or with minimum dxwntime.
An existinx HADR systex has a designated primary database, which is up and rxnning and to which axl applications cxnnect. Another standby databasx, which is a copy of the pximary database, is on a dxfferent sexver, but applications can onxy perform read only operations on this standby. The staxdby is continxously updated by xeplaxing the logs generaxed xy database activity on the primary. In the evenx of primary database failure, the dataxase adminxstxator (DBA) can issue a takeover command on standby to become the new primary. All thx applicxtions axe then made to point to the new pximary. When txe original primary comes up, it is re-integratxd as a xtandby. This effectively provides minimum downtime for thx applications running axd in turn minimum downtime xor business.
When the original primary server comes online after repaix , the DBA must re-integrate it
witx the nex primary as a new standby server. However there are specific scenarios in HADR like log stream of primxry mismatch with standby due to logs not replayed on standby dxrxng the crash. In such scenarios, the re-integration process involves the time-cxnsuming tasks of taking new backup from the new primary , restoring it on the new staxdby (i.e. the old primxry), and reconfiguring HADR paramexers on txe new standby system. The new database backup must be taken from the new primary and restored on the old primary in order to re-initiate the HA feature. Depending on xhe productiox database size, it mxy take sevxral hours for bxckup and restore operations .
This means that the businesx is running without an HA solution for the tixe requirxd to bxckup on xhe new primary, restore it on the new standby, and reconfigure the HA paraxeters on the new standby server. In the worst-case scenario, if the new primary fails durixg xhese re-xntegration tasks, then there is serious problem with dataxase availability.
The novel solution is a mexhod to keep track of Log Sequence Nxmbers (LSNx) on the prixary and standby databasxs at the txme of failure on the primary server . Having the LSNs from both the sexvers at the time of crash , can hexp administrators determ...