Browse Prior Art Database

System and method for reducing the chances of failed taks by selecting the node for running Map Reduce taks on the basis of past statistics

IP.com Disclosure Number: IPCOM000245467D
Publication Date: 2016-Mar-11
Document File: 3 page(s) / 63K

Publishing Venue

The IP.com Prior Art Database

Abstract

Hadoop is a software framework that supports data-intensive, distributed applications. There are two aspects of Hadoop that are important to understand:

1)MapReduce is a software framework introduced to support distributed computing on large data sets of clusters of computers.

2) The Hadoop Distributed File System (HDFS) is where Hadoop stores its data. This file system spans all the nodes in a cluster. Effectively, HDFS links together the data that resides on many local nodes, making the data part of one big file system.

In Hadoop, the parallel processing is achieved using the Map Reduce Jobs. The cluster have the Master slave architecture ( Name Nodes and Data Nodes - worker nodes). Each file is broken into blocks and 3 copies of that is kept on 3 different nodes. When the processing of data is to be done, the Name Node provides the list of nodes where the block is present and the Mapper task is executed preferably on the local node where the data block is present. While the Job Tracker will always try to pick nodes with local data for a Map task, it may not always be able to do so. One reason for this might be that all of the nodes with local data already have too many other tasks running and cannot accept anymore. When the particular node is selected for Mapper task, it could be possible that the selected node is not performing well and the chances of failure of the current Mapper is high on that node. This is not considered by the Job Tracker while deciding on which node it should run the Mapper. The number of worker nodes in the cluster may be huge(100's of them) and always some nodes must be always performing better than the others. If past statistics of these nodes like corrupt sector on each node, past success rate on the particular node, number of times the nodes have failed in the past etc, are collected, then more smart decision could be taken by the Name Node to run the mapper on.

Article herein is to use the past statistics of each node for selecting a particular Data Node to do the parallel processing on one of the nodes out of 3 copies of the data blocks.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 51% of the total text.

Page 01 of 3

System and method for reducing the chances of failed taks by selecting the node for running Map Reduce taks on the basis of past statistics

User scenario :

Consider that we have one namenode and multiple data nodes.

One of the block B1 of file is present on 3 data nodes N1, N2 and N3. Suppose N2 has a sector level corruption and the particular block B1 is present in that sector S1 only. Now if the job comes to Name Node which require reading of Block B1. Then currently , if N2 is selected , we will get read block failure. There is no mechanism right now to record this historical information. So is the same N2 is selected to read B1 , the same process will be repeated which will waste a considerable amount of processing time.

Solution Statement :

The intelligent algorithm here can prevent this situation effectively. Lets take the same example as explained in the user scenario. With the new proposed algorithm , name Node will keep all the past statistics of block read failures on these nodes. Suppose N2 has a sector level corruption and the particular block B1 is present in that sector S1 only. This information will be sent to the Name Node when the read failure occurs. This way , Name Node will have the error information - (node N2, block B1, sector S1).

Now if the job comes to Name Node which require reading of Block B1. Then, Name Node will utilize the information about past failure (sector level failure as described above) to take the decision. In this case, Name Node will not select node N2 for reading Block B1, instead will select either of Node N1 or N3.

The proposed technique is also smart enough to report back the following scenario. e.g. , once the node N2 with Block B1 is corrected by replacing disk or other means, Node N2 will send the updated information about the block B1 to Name Node. The Name Node will clear the error information about block B1 on N2. This updated information will be used by Name Node while selecting node for the execution of next job. This way we are avoiding a block read failure and improving the performance.

The advantages of the new technique :
1) The chance of failure of the mapper tasks will be reduced significantly.

2) In some of the situation always one of the node is selected by Job Tracker while...