Browse Prior Art Database

A Method and System for Developing Communication-Optimized Distributed Learning Infrastructure over Clusters of Multi-GPU Nodes

IP.com Disclosure Number: IPCOM000248024D
Publication Date: 2016-Oct-19
Document File: 1 page(s) / 23K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed an approach to reduce communication costs associated with multiple Graphics Processing Units (GPUs). The solution builds a communication optimized distributed learning algorithm that exploits clusters of nodes with multiple GPUs in each node.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 89% of the total text.

Page 01 of 1

A Method and System for Developing Communication - Infrastructure over Clusters of Multi -

A method is needed to reduce communication costs associated with multiple Graphics Processing Units (GPUs).

The disclosure describes an approach for building a communication optimized distributed learning algorithm that exploits clusters of nodes with multiple Graphics Processing Units (GPUs) in each node.

The core idea of the new algorithm is to partition the communication into two phases. One phase communicates scalar values and the other aggregates neural network parameters. The algorithm aims to reduce communication costs by changing the frequency and type of communication.

The new algorithm is designed to scale over a cluster of nodes connected via a slower network, each connected with multiple GPUs. The network can be hosted in a cloud environment or built on-site. The algorithm uses a two-phase communication strategy in which the communication over the slower network involves operating on scalar data and the communication involving large datasets is implemented within a node, between multiple GPUs.

Further, the solution reduces multi-GPU communication costs by:


1. Replacing costly collective functions by point-to-point functions


2. Reducing frequency of communication by communicating after multiple iterations


3. Overlapping communication and computations


4. Using half-precision data to reduce the space

The algorithm improves the scalability of the original algorithm...