Browse Prior Art Database

Crash-Consistent Clustered Storage System's Minimal Instruction Set Logging

IP.com Disclosure Number: IPCOM000245047D
Publication Date: 2016-Feb-06
Document File: 3 page(s) / 40K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a method to perform log operations using one operation code: one or more writes to persistent storage. Each write is described by a persistent storage address and a value to store at that address. Multiple write operations are combined into an atomic unit by specifying a write count. The method is to read the log entries and apply the described modifications in the order of logging.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 42% of the total text.

Page 01 of 3

Crash-Consistent Clustered Storage System's Minimal Instruction Set Logging

A system is crash-consistent if in the event of a crash it is in or recoverable to a state in which all rules are maintained.

A cluster is a system comprised of one or more computer nodes. In a cluster concurrent code-load , when a cluster's code is updated, some nodes in a cluster might have old code while others have new code. If a node running new code

wants to perform a particular operation, it must consider whether, in the event of a crash, another node, perhaps one running old code, is able to roll forward or backward the operation such that the system is consistent.

Each log entry must be declared complete for a recovery to recognize that it need not be re-applied. However, if the storage modifications that a log entry describes are made and a system crashes before the associated log entry is declared complete, on re-application the modifications must achieve the same result. Achieving idempotence is the responsibility of the one defining how to interpret the operation. If on application of a log entry a node performed a read and performed some operation based on the value it read, it is possible if that operation were performed again, then the value read can be different, with a different outcome.

The novel contribution is a method to perform log operations using one operation code: a write to persistent storage. It contains a persistent storage address and a value to store at that address. Multiple write operations are combined into an atomic unit by specifying a write count. Each operation that modifies persistent storage is described in a log stored somewhere that is persistent across cluster crashes. Following a crash, the method is to read the log entries and apply the described modifications in the order of logging.

Log space is allocated from persistent storage. Some space is reserved, but additional space can be allocated on demand. If log space fills with pending operations and a new operation requires log space, then it would need to wait for one or more pending operations to complete and free the log space so the new operation's entry could be stored.

With the novel method, because each log entry is persistent, each operation can be considered complete to a client upon storage of the associated log entry. The modifications that a log entry describes can then be completed in the background to free the space the log entry consumes and reduce the work at recovery time. Once an entry is applied, and all entries before it are completed, it too can be considered complete, making space available for new entries.

A log's entries are chronologically ordered, enabling application in the same order as entry. Entries subsequent in time are stored subsequently in space. When a log's end in space is reached, entries subsequent in time are stored at its

1


Page 02 of 3

beginning in space.

A pointer to the oldest uncompleted entry is stored in persisten...