On Apr 14, 2008, at 9:45 PM, Jeho wrote:
> Analyzing performance of tcp and lamd rpi on a Linux cluster (Linux
> Kernel 2.4.21-52), I've found that for an MPI application with
> (only) collective communications completion of the application got
> slowed down with tcp rpi, but not with lamd rpi when they were run
> on a set of busy nodes. It is somewhat strange because what I
> understood from User's manual and the design documentation of RPI
> SSI module (1.0-ssi-rpi.pdf) was that the tcp and lamd rpi apply to
> point-to-point communications, not to collective communications. Is
> it misunderstanding?
All you've said is true. But the collective communications are
layered on top of the point-to-point communication. So the RPI in use
will change the performance characteristics of collective communication.
> So... My questions are:
> 1) does tcp rpi module and lamd rpi module work differently for
> collective communications like MPI_Reduce?
> and
They will cause the collective communication to behave differently,
yes. The lamd RPI tends to have better overlap of computation and
communication and the tcp RPI tends to have better raw performance.
MPI_REDUCE can be very sensitive to timing mismatches. It's possible
that the better overlap of the lamd is helping compensate for the poor
timing due to high system load.
> 2) is there a more detailed specification, manual, or reference for
> tcp and lamd rpi implementation?
Just the code and the RPI SSI module documentation. Sorry :(.
Brian
--
Brian Barrett
LAM/MPI Developer
Make today a LAM/MPI day!
|