A Method and System for Developing Communication-Optimized Distributed Learning Infrastructure over Clusters of Multi-GPU Nodes
Publication Date: 2016-Oct-19
The IP.com Prior Art Database
Disclosed an approach to reduce communication costs associated with multiple Graphics Processing Units (GPUs). The solution builds a communication optimized distributed learning algorithm that exploits clusters of nodes with multiple GPUs in each node.
Page 01 of 1
A Method and System for Developing Communication - Infrastructure over Clusters of Multi -
A method is needed to reduce communication costs associated with multiple Graphics Processing Units (GPUs).
The disclosure describes an approach for building a communication optimized distributed learning algorithm that exploits clusters of nodes with multiple Graphics Processing Units (GPUs) in each node.
The core idea of the new algorithm is to partition the communication into two phases. One phase communicates scalar values and the other aggregates neural network parameters. The algorithm aims to reduce communication costs by changing the frequency and type of communication.
The new algorithm is designed to scale over a cluster of nodes connected via a slower network, each connected with multiple GPUs. The network can be hosted in a cloud environment or built on-site. The algorithm uses a two-phase communication strategy in which the communication over the slower network involves operating on scalar data and the communication involving large datasets is implemented within a node, between multiple GPUs.
Further, the solution reduces multi-GPU communication costs by:
1. Replacing costly collective functions by point-to-point functions
2. Reducing frequency of communication by communicating after multiple iterations
3. Overlapping communication and computations
4. Using half-precision data to reduce the space
The algorithm improves the scalability of the original algorithm...