The Prior Art Database and Publishing service will be updated on Sunday, February 25th, from 1-3pm ET. You may experience brief service interruptions during that time.
Browse Prior Art Database

Topology-Aware Resource Allocation in Multi-GPU Architecture

IP.com Disclosure Number: IPCOM000247369D
Publication Date: 2016-Aug-29
Document File: 5 page(s) / 172K

Publishing Venue

The IP.com Prior Art Database


Disclosed is a topology-aware resource allocation algorithm to maximize the application performance in a Graphics Processing Unit (GPU) Cluster.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 01 of 5


A Graphics Processing Unit (GPU) Cluster, which contains multiple GPUs in a system, provides good performance at a relatively low energy consumption rate. Depending on how multiple GPUs are utilized, the aggregated throughputs to run applications are different, as each communication link has unique maximum bandwidth.

The novel contribution is a topology-aware resource allocation algorithm to maximize the application performance.

A cluster of Graphic Processing Units (GPUs) forms a network with a certain topology. Depending on how units are connected, each communication link comes with different data rate. A micro-benchmark captures the topology and profiles the aggregated throughput while moving data among GPUs and between Central Processing Units-Graphics Processing Units (CPUs-GPUs). When a user/application requests the number of GPUs to run a data-intensive application, the novel algorithm employs the profile and allocates applications to the GPUs where the performance (with respect to execution time, delay) can be maximized.

The steps for a preferred embodiment include:

1. Profiling aggregated throughput: A micro benchmark profiles topology and aggregated throughput

2. Workload placement: A data-intensive application requests multiple GPUs to accelerate its performance

3. Resource allocation: Resource allocator uses the profiling information and places

workload/applications to GPUs where the performance can be maximized

Figure 1: Process flow, embodiment

• Dotted line represents the profiling processes • Solid line represents resource allocation processes
• Resource allocator places the jobs to the GPUs where the performance can be maximized

Proposed Solution 1 (General Solution)

A micro-benchmark can run any time to profile communication throughput. Communication can be either unidirectional or bidirectional, which includes host-to- device (HTD), device-to-host (DTH), and device-to-device (DTD) communication.

-Aware Resource Allocation in Multi

Aware Resource Allocation in Multi - -GPU Architecture

GPU Architecture


Page 02 of 5

Benchmark measures the aggregated throughput when multiple links are utilized. Profiling information is stored in a database in tables. Based on the profiling information, the resource allocator places the job to the GPUs in order to maximize the performance (e.g., minimize the application execution time). The solution to profile each communication pair may not be scalable: 16C1 + 16C2 + 16C2

Proposed Solution 2 (Specific solution, only with topology, without throughput profiling)

A micro-benchmark can run any time to find the topology. It...