Browse Prior Art Database

RELIABLE, SCALABLE, AND HIGH­-PERFORMANCE DISTRIBUTED STORAGE: Data Distribution

IP.com Disclosure Number: IPCOM000234958D
Original Publication Date: 2014-Feb-19
Included in the Prior Art Database: 2014-Feb-19
Document File: 8 page(s) / 478K

Publishing Venue

Linux Defenders

Related People

Sage Weil: AUTHOR

Abstract

The work is a robust hierarchical data distribution function that places data in a large distributed storage system. When used in place of conventional allocation tables, the algorithm efficiently addresses a range of critical storage ­related placement issues, including statistically uniform data distribution, correlated failure and data safety, and data migration in dynamically changing storage clusters. Specifically, the algorithm places sets of data objects (either object replicas, or parity or erasure coded fragments) in a hierarchy of storage devices using a flexible rule language. The flexible hierarchical cluster description facilitates an adjustable balance between extremely efficient placement calculations and mapping stability when devices are added or removed from the cluster. It further provides the ability to enforce the separation of replicas across user­ defined failure domains, limiting exposure to data loss due to correlated failures.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 14% of the total text.

Page 01 of 8

RELIABLE, SCALABLE, AND HIGH­PERFORMANCE DISTRIBUTED STORAGE: Data Distribution

Authored by: Sage Weil

Abstract

The work is a robust hierarchical data distribution function that places data in a large distributed storage

                   system. When used in place of conventional allocation tables, the algorithm efficiently addresses a range of

                    critical storage­related placement issues, including statistically uniform data distribution, correlated failure and

 

                        data safety, and data migration in dynamically changing storage clusters. Specifically, the algorithm places sets               
of data objects (either object replicas, or parity or erasure coded fragments) in a hierarchy of storage devices

                  
using a flexible rule language. The flexible hierarchical cluster description facilitates an adjustable balance

                 between extremely efficient placement calculations and mapping stability when devices are added or removed              
from the cluster. It further provides the ability to enforce the separation of replicas across user­defined failure

                    domains, limiting exposure to data loss due to correlated failures.

Keywords: File systems, distributed storage systems, hierarchical data distribution

Introduction

The work described has been termed CRUSH (Controlled Replication Under Scalable Hashing), a pseudo

                random data distribution algorithm that efficiently and robustly distributes object replicas across a

                 heterogeneous, structured storage cluster. CRUSH is implemented as a deterministic function that maps an

                       input value-typically an object or object group identifier-to a list of devices on which to store object

                             replicas. This differs from conventional approaches in that data placement does not rely on any sort of per­file

                      
or per­object directory-CRUSH needs only a compact, hierarchical description of the devices comprising the

  

                               storage cluster and knowledge of the replica placement policy. This approach has two key advantages: first, it

                    
is completely distributed such that any party in a large system can independently calculate the location of any

                   object; and second, what little metadata is required is mostly static, changing only when devices are added or

                      removed.

CRUSH is designed to optimally distribute data to utilize available resources, efficiently reorganize data when

                    storage devices are added or removed, and enforce flexible constraints on object replica placement that

                   maximize data sa...