Browse Prior Art Database

Job Distribution Framework for Large Scale Clusters and Grids

IP.com Disclosure Number: IPCOM000150663D
Original Publication Date: 2007-Apr-19
Included in the Prior Art Database: 2007-Apr-19
Document File: 7 page(s) / 324K

Publishing Venue

IBM

Abstract

The presented framework utilizes a novel API to allow flexible and generic implementations. The jobs designed therefore can run in various environments because the framework can completely hide the communication and adaptation from the job. Benchmarks show that the framework can thereby achieve good to superior speed-ups.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 20% of the total text.

Page 1 of 7

Job Distribution Framework for Large Scale Clusters and Grids

This paper presents the design and performance analysis of a new framework for parallelization of loosely coupled computations on heterogeneous grid systems. Its main contribution is the separation of how to split up a given job from how to adapt it to the parallel computer. This is achieved by a specialized API and yields a generic, performant framework.

Keywords: grid computing, large scale, cluster, heterogeneous, self organizing, adaptive.

1 Introduction

The recent years have seen a constantly growing demand for computing power. Engineers evaluate prototypes using simulations and scientists use programs to test new theories. Crucial to their progress is the efficient utilization of all available computing resources. To cater for their demands, many companies and universities have set up large scale cluster computers or compute grids. However, efficient parallel programming is significantly more complex than the development of serial software and not everyone has the time and experience to deal with its pitfalls. Therefore it is desirable to have a framework which abstracts away the parallel nature of the computing system and allows the user to use it similarly to a traditional system.

In the past there has been a great multitude of frameworks targeted at different problem sets, ranging from physical simulations [1] to graph algorithms [8]. As such a framework maps a job to a parallel computing system, its applicability is limited by two characteristics: For which kinds of jobs is the framework designed for and on which systems can it run?

A significant amount of compute intensive jobs consist of loosely coupled, self-contained items. Examples include Monte Carlo methods, ray tracing and genetic algorithms. This work focuses on such jobs which are dominated by computation time and generate only little output, so that the costs of communicating the results are dominated by fixed setup costs.

Those jobs are often also referred to as embarrassingly parallel [4]. The parallelization is generally straightforward, but looking closer there occur a number of issues which can severely limit scalability. Therefore the framework described in the following sections strives to segregate the domain specific knowledge of job decomposition from the know-how of how to deal with scalability issues.

The paper is organized as follows: First the related work is presented and based on this the API of the framework is developed. It concludes with a performance analysis, rating the framework against competing approaches.

2 Related Work

The most prominent example of frameworks geared towards embarrassingly parallel computations is the Berkeley Open Infrastructure for Network Computing (BOINC)
[5]. It is based on the client-server pattern, where multiple clients connect to...