Browse Prior Art Database

Page Level Log Merging In Shared Data Architecture

IP.com Disclosure Number: IPCOM000238160D
Publication Date: 2014-Aug-06
Document File: 7 page(s) / 152K

Publishing Venue

The IP.com Prior Art Database

Abstract

In shared data architecture, we need to perform log merging serially when other works can be performed parallelly, which is a bottle neck for the performance. With the approach, we can perform the log merging parallelly rather than serially, which will improve the performance significantly. The basic idea is to perform the log merging after the log records are dispatched into page queues. After the log records are dispatched into page queues, we can perform log merging on every queue and make the work parallel since the work on different queues will not affect others. Additionally, we need to do some special handling for special cases.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 42% of the total text.

Page 01 of 7

Page Level Log Merging In Shared Data Architecture

Overview

The disclosure is about to perform log merging in the page level when performing recovery for multiple members in shared data architecture. With the approach, we can perform the log merging parallelly rather than serially, which will improve the performance significantly.

Background

Shared Data Architecture

The shared data architecture is a classic scenario of RDBMS. There are multiple independent members, which can access a global data set concurrently based on a clustering file system. In the meantime, each member has an independent log stream to write log records.

1



Page 02 of 7

Log Merging

Usually we use LSN (Log Sequence Number) to order the log records. LSN is an ever-increasing counter for a single log stream and can be used to order log records for recovery in a non-shared data architecture.

In shared data architecture, we cannot determine the order of all the log records only based on LSN since the members are independent,
i.e. there is no order of LSN from different members. So an additional global time stamp was added when the log records were generated. For the recovery in shared data architecture, we need to merge all the log streams according the global time stamp before replaying the logs.

Recovery Processing in Shared Data Architecture

2



Page 03 of 7

Recovery of RDBMS means to replay the log records according the order when they are generated. We must keep the order because the log records might be dependent. If we breaks the order, we might break the consistency of the data.

Firstly we read logs from all the log streams. Then, as mentioned in above section, we merge the log streams before replaying them. Note the merging is serial because we need to order all the log records according the global time stamp.

After that, we will get a single log stream. We will dispatch the log records in the stream into some queues by hashing on the page which the log record belongs to. As a result, all the log records belong to same page will be added to same queue.

Finally, there are multiple log replayer, who will read the log records in the queues and replay them. Since there is no dependency among different queues (because they belong to different pages), the log replayer can work parallelly. (There is also special case about dependency cross multiple pages, which will be mentioned in the last section.)

The Shortcoming

Obviously, the bottle neck of the recovery processing is log merging. Every other phases (reading log streams, replaying log records) can be performed parallelly, but only the log merging must be performed serially because we need to order all the log records according the global time stamp. It will heavily impact the performance, and is one of the major issue of the recovery performance in shared data architecture.

3



Page 04 of 7

Core Idea

The basic idea is to perform the log merging after the log records are dispatched into page queues.

Firstly, ther...