Hi Brian,
Thank you for your answers. I appreciate it.
I have some more questions to clarify my understandings for your answers. :)
> On Apr 14, 2008, at 9:45 PM, Jeho wrote:
>> Analyzing performance of tcp and lamd rpi on a Linux cluster (Linux
>> Kernel 2.4.21-52), I've found that for an MPI application with (only)
>> collective communications completion of the application got slowed down
>> with tcp rpi, but not with lamd rpi when they were run on a set of busy
>> nodes. It is somewhat strange because what I understood from User's
>> manual and the design documentation of RPI SSI module (1.0-ssi-rpi.pdf)
>> was that the tcp and lamd rpi apply to point-to-point communications,
>> not to collective communications. Is it misunderstanding?
>
> All you've said is true. But the collective communications are layered
> on top of the point-to-point communication. So the RPI in use will
> change the performance characteristics of collective communication.
You mean the collective communications are to use tcp module when tcp RPI
was chosen and to use lamd module when lamd RPI was chosen?
>
>> So... My questions are:
>> 1) does tcp rpi module and lamd rpi module work differently for
>> collective communications like MPI_Reduce?
>> and
>
> They will cause the collective communication to behave differently, yes.
> The lamd RPI tends to have better overlap of computation and
> communication and the tcp RPI tends to have better raw performance.
> MPI_REDUCE can be very sensitive to timing mismatches. It's possible
> that the better overlap of the lamd is helping compensate for the poor
> timing due to high system load.
Do you mean MPI_Reduce can be more sensitive to timing mismatch than other
collective communications? If so, would you please explain it for me?
>> 2) is there a more detailed specification, manual, or reference for tcp
>> and lamd rpi implementation?
>
> Just the code and the RPI SSI module documentation. Sorry :(.
It's okay. :)
>
> Brian
>
> --
> Brian Barrett
> LAM/MPI Developer
> Make today a LAM/MPI day!
>
>
Thank you again.
Jeho Park
|