Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Method of Compute Resource Allocation in a Batch Job Submission Environment

IP.com Disclosure Number: IPCOM000118773D
Original Publication Date: 1997-Jul-01
Included in the Prior Art Database: 2005-Apr-01
Document File: 2 page(s) / 101K

Publishing Venue

IBM

Related People

Brase, BA: AUTHOR [+2]

Abstract

Disclosed is a method of assigning compute resource to various projects in a shared compute environment. In this method, the allocation is done dynamically based on the progress of the project rather than strictly based on the forecast done by each project. This way, all projects make good progress towards the end goal.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Method of Compute Resource Allocation in a Batch Job Submission Environment

      Disclosed is a method of assigning compute resource to various
projects in a shared compute environment.  In this method, the
allocation is done dynamically based on the progress of the project
rather than strictly based on the forecast done by each project.
This way, all projects make good progress towards the end goal.

      In a distributed computing environment, compute resources are
often shared between a number of projects.  Typically, resources are
assigned based on the forecast done by each projects.  As the time
passes, all projects do not make equal progress as planned in the
forecast.  If all projects are equally important, then it is crucial
that the projects which are behind need more compute resource than
the projects which are ahead.

      This disclosure illustrates a method of allocating resources
dynamically based on the progress of the projects.  The example given
here is of a system simulation environment where compute resource is
shared among various projects.  The compute resource is shared by
running a batch job submission application (like 'loadleveler'
application which is offered by IBM in the market for running in a
RS/6000* clusters of machines).

      In the 'loadleveler' application, the compute resources are
assigned by allocation of number of machines per available job
classes.  The job classes are then assigned to different projects.
However, all machines are available to all job classes on a second
priority basis if a machine is freed up and no jobs exist in the job
class to which that machine is assigned.

      To describe the problem, assume that there are 'N' number of
projects (ProjectA, ProjectB.....ProjectN) running in a batch job
environment.  Each project is assigned (or guaranteed) a certain
number of machines in the cluster based on the forecast done by each
project.

      The problem with the above method occurs when a project (let us
assume, Project A) runs into some technical difficulties and is
unable to run its simulation jobs.  Now, all machines assigned to
ProjectA will start running jobs from other projects.  After the
problem is fixed and ProjectA simulation is restarted, the difficulty
is ProjectA will have to wait until all the jobs running on machines
assigned to it are finished.  The lost computer time due to down time
and waiting time for ProjectA cannot be recovered unless manual
reassignment of the machines is done.

      To solve the above problem, compute resource is dynamically
allocated to all projects based on the amount of simulation
accomplished within a predefined period on a daily, weekly or monthly
basis as desired.

      In this method, each project is asked to forecast their
simulation needs in terms of CPU hours needed to achieve their
simulation goal.  The forecast can be done for a longer period such
as a year or six...