Browse Prior Art Database

A Method for Message Passing Interface (MPI) Process Mapping to Minimize the Communication Latency on Symmetric Multiprocessor (SMP) and Multicore Architectures

IP.com Disclosure Number: IPCOM000192774D
Original Publication Date: 2010-Feb-02
Included in the Prior Art Database: 2010-Feb-02
Document File: 2 page(s) / 98K

Publishing Venue

IBM

Abstract

Disclosed is a method for effectively mapping heavily communicating Message Passing Interface (MPI) Processes to a node on a cluster of single/multi-core Symmetric Multiprocessors (SMPs) for reducing communication overhead. Further, the method also maps the MPI processes to one or more cores which are closer to each other within a node for significantly reducing communication overhead.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 2

A Method for Message Passing Interface (MPI) Process Mapping to Minimize the

Communication Latency on Symmetric Multiprocessor (SMP) and Multicore

Architectures

A method is disclosed for effectively mapping heavily communicating Message Passing Interface (MPI) Processes to a node on a cluster of single/multi-core Symmetric Multiprocessor (SMPs) for reducing communication overhead. The method also maps the MPI processes to one or more cores which are closer to each other within a node for further reducing communication overhead.

The method disclosed herein extends a compiler based communication analysis technique that effectively maps MPI processes at an inter-node level to the intra-node level. This method further maps the MPI processes to the cores on the node.

Consider an example, where there are 16

processes to be launched on to 2 nodes, each node with

two way quad-cores. With the compiler based communication analysis technique, it is determined that the mapping at the inter-node level is as follows:

Node1: 0, 2, 4, 6, 8, 10, 12, 14

Node2: 1, 3, 5, 7, 9, 11, 13, 15

where the numbers 0, 1, 2, …, 15 are the ranks of the MPI processes.

In order to find the cores that are close to each other on the nodes, two copies of a simple MPI latency determining application is executed on different combinations of cores. Based on the latency it is found that on node1 cores 0,1,2,3 are close to each other and cores 4,5,6,7 are close to each other - which are represented as (0,1,2,3) and (4,5,6,7).

Thereafter, a graph partition algorithm is applied at intra-node level to determine he...