Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Method to balance interprocess high performance communication

IP.com Disclosure Number: IPCOM000238999D
Publication Date: 2014-Sep-30
Document File: 3 page(s) / 97K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a wiring strategy to oversubscribe parts of the compute cluster to reduce interconnect costs. The proposed solution then recommends moving jobs to compute node-to-switch connections, based on the inter-compute node message passing attributes, to parts of the cluster that are not oversubscribed.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 50% of the total text.

Page 01 of 3

Method to balance interprocess high performance communication

Infiniband fabric topologies often require adding more uplink connections than host side connections on a given switch in order to balance the fabric (see figures below). A balanced fabric provides higher performance for the overall solution, and thus is the recommended approach when configuring solutions with multi-tiered fabrics. However, the problem is that a multi-tiered fabric solution adds switch and cable interconnect costs to the overall solutions when this robust interconnect is not required in many applications.

The novel contribution is a wiring strategy to oversubscribe parts of the compute cluster to reduce interconnect costs. The proposed solution then recommends moving jobs to compute node-to-switch connections, based on the inter-compute node message passing attributes, to parts of the cluster that are not oversubscribed. This means allocating compute nodes for jobs that require greater interprocessor communication to systems residing on switches that have more uplinks than host links, due to the greater bandwidth present in these paths.

This strategy involves analyzing a network/fabric topology in order to determine potential areas of performance optimization based on fabric topology exploitation. In order to achieve a balanced fabric, it is necessary to add more cables/connections than required in many cases to meet the blocking ratio requirement. Adaptive routing software (SW) can handle some of the issues associated with an unbalanced topology by trying to route interprocess communication among multiple compute nodes through multi-tiered network configuration. However, this method is not as efficient as adding the extra connections, because the interprocess latency is higher by going through more switches than necessary. In addition, different compute nodes going through a different number of tiered switches means the intercompute node latency is different for parallel-processing applications executing in different parts of the cluster. Lastly, the adaptive routing software adds licensing cost to the solution as well as processing overhead. To avoid this problem, the scheduler can communicate with the management entity to determine which parts of the compute configuration are better suited for balanced inter-compute node communication. In these cases, the management server can take advantage of higher bandwidth fabric segments within the overall configuration to allocate jobs that require more interprocessor communication to the said servers/switches.

Balancing the fabric ensures a predictable performance. If there are x number of edge switches with 18 nodes and one with only 10, then the average number of hops between devices is not consistent. The switches with 18 nodes will benefit more from locality than would the one with only 10 nodes. This is true regardless of whether there are more than 10 uplinks. The better approach is to evenly spread th...