Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Method for defining and exploiting Replica Speed Bins in MapReduce clusters

IP.com Disclosure Number: IPCOM000230632D
Publication Date: 2013-Aug-28
Document File: 5 page(s) / 56K

Publishing Venue

The IP.com Prior Art Database

Abstract

Method of optimizing the performance of disk storage in MapReduce clusters by dividing physical disks into fast and slow regions and placing data replicas accordingly to achieve a preferred replica that can be accessed faster than other replicas.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 51% of the total text.

Page 01 of 5

Method for defining and exploiting Replica Speed Bins in MapReduce clusters

This article describes a method of optimizing the performance of disk storage in MapReduce clusters by dividing physical disks into fast and slow regions and placing data replicas accordingly to achieve a preferred replica that can be accessed faster than other replicas.

MapReduce is becoming the standard framework for large-scale analytics. MapReduce takes advantage of massive scaling of processors and local disks, and typically runs on compute clusters. Analytical tasks, from here on referred to as MapReduce jobs or simply jobs, are run against a large dataset that is distributed onto the cluster in the form of blocks of a file. A job hence runs on many machines in parallel by running one or more mappers and reducers on each machine. Most of today's implementations of MapReduce such as Apache Hadoop are designed to perform best on commodity servers. The underlying storage configuration for such servers is therefore typically just a collection of disks, also referred to as "Just a Bunch Of Disks" (JBOD). A MapReduce distributed filesystem is prone to failure due to the failure of disks in JBOD configurations (instead of RAID configurations) and also by losing a given server itself. In order to overcome this limitation, most MapReduce implementations replicate disjoint portions of the data onto different servers.

A typical usage pattern of MapReduce today is batch processing where a given job runs its mappers and reducers on a on all available disks at once in order to increase streaming performance for the processors and make disk accesses as uniform as possible. This typical usage pattern depends on the scheduler of the MapReduce implementation. If mappers are not dispatched on different replicas of the same data speculatively, it is implied that only one replica of the data is used during the runtime of a given job. The other replicas are then only maintained to recover from failing nodes in the cluster.

Since second and third replicas of the data are only used to recover from node failures, it is beneficial to maximize the performance of accessing the first replica. Large hard disk drives (HDDs) used in MapReduce clusters exhibit a significant decline of streaming performance on inner tracks vs. outer tracks. The usage of the storage media in MapReduce clusters thus poses a large potential to optimize streaming throughput for MapReduce analytical workloads.

It is proposed to have a method to define different pools on the storage media of a MapReduce cluster, each having a difference performance when accessing it. These pools will henceforth be referred to as Replica Speed Bins (R...