Topology-Aware Resource Allocation in Multi-GPU Architecture
Publication Date: 2016-Aug-29
The IP.com Prior Art Database
Disclosed is a topology-aware resource allocation algorithm to maximize the application performance in a Graphics Processing Unit (GPU) Cluster.
Page 01 of 5
A Graphics Processing Unit (GPU) Cluster, which contains multiple GPUs in a system, provides good performance at a relatively low energy consumption rate. Depending on how multiple GPUs are utilized, the aggregated throughputs to run applications are different, as each communication link has unique maximum bandwidth.
The novel contribution is a topology-aware resource allocation algorithm to maximize the application performance.
A cluster of Graphic Processing Units (GPUs) forms a network with a certain topology. Depending on how units are connected, each communication link comes with different data rate. A micro-benchmark captures the topology and profiles the aggregated throughput while moving data among GPUs and between Central Processing Units-Graphics Processing Units (CPUs-GPUs). When a user/application requests the number of GPUs to run a data-intensive application, the novel algorithm employs the profile and allocates applications to the GPUs where the performance (with respect to execution time, delay) can be maximized.
The steps for a preferred embodiment include:
1. Profiling aggregated throughput: A micro benchmark profiles topology and aggregated throughput
2. Workload placement: A data-intensive application requests multiple GPUs to accelerate its performance
3. Resource allocation: Resource allocator uses the profiling information and places
workload/applications to GPUs where the performance can be maximized
Figure 1: Process flow, embodiment
• Dotted line represents the profiling processes • Solid line represents resource allocation processes
• Resource allocator places the jobs to the GPUs where the performance can be maximized
Proposed Solution 1 (General Solution)
A micro-benchmark can run any time to profile communication throughput. Communication can be either unidirectional or bidirectional, which includes host-to- device (HTD), device-to-host (DTH), and device-to-device (DTD) communication.
-Aware Resource Allocation in Multi
Aware Resource Allocation in Multi - -GPU Architecture
Page 02 of 5
Benchmark measures the aggregated throughput when multiple links are utilized. Profiling information is stored in a database in tables. Based on the profiling information, the resource allocator places the job to the GPUs in order to maximize the performance (e.g., minimize the application execution time). The solution to profile each communication pair may not be scalable: 16C1 + 16C2 + 16C2…
Proposed Solution 2 (Specific solution, only with topology, without throughput profiling)
A micro-benchmark can run any time to find the topology. It...