On Apr 16, 2008, at 5:48 AM, Jeho Park wrote:
>> All you've said is true. But the collective communications are
>> layered
>> on top of the point-to-point communication. So the RPI in use will
>> change the performance characteristics of collective communication.
> You mean the collective communications are to use tcp module when
> tcp RPI
> was chosen and to use lamd module when lamd RPI was chosen?
Yes.
>> They will cause the collective communication to behave
>> differently, yes.
>> The lamd RPI tends to have better overlap of computation and
>> communication and the tcp RPI tends to have better raw performance.
>> MPI_REDUCE can be very sensitive to timing mismatches. It's possible
>> that the better overlap of the lamd is helping compensate for the
>> poor
>> timing due to high system load.
> Do you mean MPI_Reduce can be more sensitive to timing mismatch than
> other
> collective communications? If so, would you please explain it for me?
Not necessarily. All the collective operations, by definition, have
multiple processes involved. If there are timing mismatches between
the processes, the overall performance can be improved by using a
central buffering agent (the lamd). This will likely apply to all the
collective operations.
Be sure to check out these FAQ entries:
http://www.lam-mpi.org/faq/category5.php3#question19
http://www.lam-mpi.org/faq/category5.php3#question20
--
Jeff Squyres
Cisco Systems
|