Hi
Analyzing performance of tcp and lamd rpi on a Linux cluster (Linux Kernel
2.4.21-52), I've found that for an MPI application with (only) collective
communications completion of the application got slowed down with tcp rpi,
but not with lamd rpi when they were run on a set of busy nodes. It is
somewhat strange because what I understood from User's manual and the design
documentation of RPI SSI module (1.0-ssi-rpi.pdf) was that the tcp and lamd
rpi apply to point-to-point communications, not to collective
communications. Is it misunderstanding?
So... My questions are:
1) does tcp rpi module and lamd rpi module work differently for collective
communications like MPI_Reduce?
and
2) is there a more detailed specification, manual, or reference for tcp and
lamd rpi implementation?
Any helpful comment would be appreciated.
Jeho
|