Browse Prior Art Database

Efficient Application Partitioning within a Cluster implementing High Availability

IP.com Disclosure Number: IPCOM000237834D
Publication Date: 2014-Jul-16
Document File: 3 page(s) / 47K

Publishing Venue

The IP.com Prior Art Database

Abstract

In this article a method is described that expands error correction methods within a Redundant Array of Inexpensive Disks (RAID) on a single controller or storage system to a Shared Nothing (SN) architecture across multiple server nodes within a cluster where data is stored on local disks. Furthermore, several applications can be run across such cluster with high availability of the data. Instead of a compute cluster setup with attached storage, a highly scalable cluster is implemented providing balanced compute and storage resources.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 51% of the total text.

Page 01 of 3

Efficient Application Partitioning within a Cluster implementing High Availability

Redundant Arrays of Inexpensive Disks (RAID) [1] are built using disks attached to a single controller or to controllers comprised in a single server system or storage subsystem. In a server cluster with data stored locally on the server nodes with such RAID protection, the system would be protected against disk fails, but not against server node fails. Usually, central storage is used with such clusters that implement a local RAID on a storage server. With Shared Nothing (SN) architectures [2] only local storage is used that needs to be protected against disk and server fails. Disclosed here is a method to leverage data stored in a redundant manner across server nodes using the functionality of a cluster filesystem like IBM's General Parallel File System (GPFS) [3].

GPFS supports SN architectures with the File Placement Optimizer (FPO) [4] feature, where the local storage subsystems are comprised under one cluster filesystem. It does support data locality, i.e. data that is accessed local to a node is stored on local disks. It does also provide replication, i.e. within the filesystem data can be replicated. With such replication, it can be ensured that all replicated data is to be stored on distinct cluster nodes so that in case of a node failure, it can be ensured that a copy of the data is to be found on the remaining nodes. Furthermore, striping of such data is supported to balance the I/O load across such nodes and to better leverage distributed I/O bandwidths on such cluster.

Such cluster can be leveraged for one application, but can also be used as a compute and storage cloud that can be leveraged for multiple applications using distinct data or sharing data in between the applications. In addition, if an application would be able to recover from node failures, standby nodes could be provided that have access to all data within the cluster and could recover from the last stored dataset.

1


Page 02 of 3

Figure: Multiple applications on a cluster using a Shared Nothing storage architecture with striped data replication and standby server node

In the Figure it is shown how the storage is distributed across several nodes: the data is stored locally within the primary replica and a secondary replica is provided in a striped manner in order to prevent the overall cluster system against data loss. The secondary replica is completely stored on distinct nodes. By doing that, in case of a server node drop out in the cluster, all data is maintained within the cluster, i.e. a full mirror of the data is being provided across the cluster.

Disclosed here is a method of distributing applications across such clusters that leverages the highly available data repository, shares such data across diffe...