Browse Prior Art Database

Flexibly adjusting compute node slot number based on job load in a grid environment Disclosure Number: IPCOM000248087D
Publication Date: 2016-Oct-25
Document File: 1 page(s) / 32K

Publishing Venue

The Prior Art Database


A method to dynamically adjust the slot number for a compute node based on its load level. In most modern job scheduling software, every compute node is set with a hard number of job slots (number of jobs that can run concurrently on). This number is normally set the same as the number of CPUs (or cores, threads) the compute node has. When all the job slots on all compute nodes are being used, the new jobs wait in the job scheduling queue. The jobs in head of the queue are dispatched to run only when there are finished jobs and vacant job slots appears on compute nodes. In a complex environment, not all jobs are with the same type or with the same load usage. Some jobs may intensively use hardware resources and some may not. The grid controller (master node) of a grid is aware of load information from all the compute nodes. When it detects some compute nodes’ slots are fully used but the hardware resources are not fully utilized, it can use this method to dynamically adjust the slot cap of those compute nodes and dispatch new jobs to run on these lighted loaded compute nodes.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 66% of the total text.

Page 01 of 1

Flexibly adjusting compute node slot number based on job load in a grid environment

To use this method, the grid controller (master node) needs to know queuing jobs' potential hardware resource usage. Otherwise dispatching a job that uses lots of hardware resources on the slot-full compute node may overload that node and thusly impact all running jobs of it. This information can be obtained with two methods.

A. Estimate the CPU, memory, IO usage of the new job based on the same/similar jobs' history usage

B. Job users specify a hardware resource usage for his/her new jobs when submitting the jobs to the queue (grid controller)

Method A) can be used in an environment where the same type of jobs that share similar runtime behaviors and use similar amountof hardware resources. The grid controller can determine whether this is true by analyzing all the historical job usage.

Method B) can be used in most of the environments.

The grid controller can categorize all the queuing jobs by their resource usage natures, using either method A) or B). When alljob slots on all compute nodes are used up, the grid controller can dynamically enlarge the slot number cap on compute nodes that have large volume of free memory and dispatch memory intensive jobs to it - the memory estimation of the new job needs to be smaller than the compute nodes current available memory. This can apply to all type of hardware resources.

To fully utilize grid's hardware resources, the grid node doesn...