Browse Prior Art Database

Method for Partitioning MapReduce Tasks

IP.com Disclosure Number: IPCOM000237399D
Publication Date: 2014-Jun-17
Document File: 1 page(s) / 26K

Publishing Venue

The IP.com Prior Art Database

Related People

Joshua Walters: INVENTOR [+3]

Abstract

A method is disclosed for partitioning MapReduce tasks dynamically to generate complex partition structures with a minimum number of files. The minimum number of files decreases the number of output files and assists in developing advanced file partition structures.

This text was extracted from a Microsoft Word document.
This is the abbreviated version, containing approximately 83% of the total text.

Method for Partitioning MapReduce Tasks

Abstract

A method is disclosed for partitioning MapReduce tasks dynamically to generate complex partition structures with a minimum number of files.  The minimum number of files decreases the number of output files and assists in developing advanced file partition structures.

Description

Disclosed is a method for partitioning MapReduce tasks dynamically to generate complex partition structures with a minimum number of files, wherein complex partition structures can include thousands of partitions.

In accordance with the method, records in a data pipeline are sampled according to records in bytes per partition.  A partition is a subset of data that belongs to a specific logical group.  Metrics obtained from the sampling are used to project out a size ratio of a given partition against all other partitions.  Thereafter, a fraction of available reducers are assigned to the given partition.  A reducer is a computer in a grid that processes data as part of the MapReduce paradigm.  The method utilizes fractional reducers in a partitioner to reuse reducers for small partition tasks.  The partitioner splits the data into different partitions and sends the partitions to specific reducers.

For example, if a mobile news website accounts for 0.73% of all data that is processed in an hour, then the number of available reducers ‘N’ is multiplied with the data percentage.  Assuming a total number of 1000 reducers, the mobile news website...