Hi

Analyzing performance of tcp and lamd rpi on a Linux cluster (Linux Kernel 2.4.21-52), I've found that for an MPI application with (only) collective communications completion of the application got slowed down with tcp rpi, but not with lamd rpi when they were run on a set of busy nodes. It is somewhat strange because what I understood from User's manual and the design documentation of RPI SSI module (1.0-ssi-rpi.pdf) was that the tcp and lamd rpi apply to point-to-point communications, not to collective communications. Is it misunderstanding?

So... My questions are:
1) does tcp rpi module and lamd rpi module work differently for collective communications like MPI_Reduce?
and
2) is there a more detailed specification, manual, or reference for tcp and lamd rpi implementation?

Any helpful comment would be appreciated.

Jeho