LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Bogdan Costescu (bogdan.costescu_at_[hidden])
Date: 2004-05-21 12:11:51


On Thu, 20 May 2004, Malar Chinnusamy wrote:

> In my program which I run with 12 processes, with 1 process/node, all even
> processes send data (1.6*10^9 bytes) to odd processes.

So, reduced to 4 processes, is it like:

0->1 and 2->3

or

0->(1 and 3) and 2->(1 and 3)

> All the processes take ~136 sec to finish. This indicates that there
> is no congestion right...

1.6Gb/136sec ~ 11 Mb/sec which is the maximum that a FastEthernet link
could do. So you are in fact bandwidth limited.

> rank 0 time taken for receiving from process 2 is 136.275022 for 200000000
> rank 2 time taken is 136.266951 for 200000000

Is the second line the time measured on the sender side ?

> rank 0 time taken for receiving from process 1 is 516.601192 for 200000000
> rank 1 time taken is 652.565886 for 200000000

If the answer to the above question is 'yes', I would suggest to try
and find out first why 2->0 has almost equal times on sender and
receiver sides, while 1->0 has significant differences between sender
and receiver.

Another question is related to the amount of data: how much memory do
you have on these nodes ? Are you sure that the data set kept in any
node's memory (including rank 0) never exceeds the physical RAM ?

> Can you please tell explain why this happens. Its only for large messages.

Is there some cut-off point ? Or the difference is just increasing as
the message size increases ? If so, how much ?

-- 
Bogdan Costescu
IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu_at_[hidden]