D. Payne, Intel SSD
L. Shuler, Sandia National Laboratories
R. van de Geijn ,
University of Texas at Austin
J. Watts ,
California Institute of Technology
This page describes the second release of the Interprocessor Collective Communications (InterCom) Library, iCC release R2.1.0. This library is the result of an ongoing collaboration between David Payne (Intel SSD), Lance Shuler (Sandia National Laboratories), Robert van de Geijn (University of Texas at Austin), and Jerrell Watts (California Institute of Technology), funded by the Intel Research Council, and Intel SSD. Previous contributors to this effort include Mike Barnett (Univ. of Idaho), Satya Gupta (Intel SSD), Rik Littlefield (PNL), and Prasenjit Mitra (now with Oracle).
The library implements a comprehensive approach to collective communication. The results are best summarized by the following performance tables
Broadcast
bytes NX/iCC BLACS/iCC MPI/iCC
-----------------------------------------
16 1.4 1.0 1.6
1024 1.5 1.0 2.5
65536 5.5 2.9 2.8
1048576 11.3 6.1 7.5
Sum-to-All
bytes NX/iCC BLACS/iCC MPI/iCC
-----------------------------------------
16 1.0 1.2 2.1
1024 1.0 1.0 2.0
65536 21.1 4.1 6.9
1048576 34.6 5.9 11.8
Attaining the improvement in performance is as easy as linking in a library that automatically translates NX collective communication calls to iCC calls. Furthermore, the iCC library gives additional functionality like scatter and gather operations, and more general "gopf" combine operations.
As had been planned, an MPI-like group interface to iCC is now available. The interface lets the user create and free groups and communicators, and it gives user-defined groups complete access to the high performance routines in the iCC library.
We would like to note that this library is not intended to compete with MPI. It was started as a research project into techniques required to develop high performance implementations of the MPI collective communication calls. We are making this library available as a service to the user community, with the hope that these techniques eventually are incorporated into efficient MPI implementations.