LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Andrey Kharuk (andrey.kharuk_at_[hidden])
Date: 2005-11-15 22:02:01


Hi David,

Thank you very much for response.
I used MPI_Barrier after MPI initializing in the each program and
before first MPI_WTime, which is before Reduce operator or before
Send/Recv block. And I didn't use it before the last WTime after code
to measure.

I thought about how to run the program on 16 physical processors. I
used nodes=8:ppn=2 for my PBS but I'm not sure it used separate
physical processors for my job. When I use xpbsmon it says

virtual processors 0:1 (cpus=1)

for each node. And I haven't found how to start one process per
processor.
However, results are quite better. You can see it there:

http://www.atspec.co.nz/Andrey/Reduce1.htm

Cheers,
Andrey

>Hi Andrey,
>
>I have experiment some bad result with HT on Xeon executing 2
>identical processus on same physical CPU...
>Have you try running only 16 lamd, one by real physical CPU? If
not,
>try it and compare the results before using 32 nodes.
>Have you try to do a MPI_Barrier just before the first MPI_WTime
and
>just before the second MPI_WTime? You could perharps see if it's a
>synchronisation problem.
>
>On other side, I have done some test between send-receive and
>MPI_gather and MPI_scatter.... using send-receive is faster (5%)
for
>me on a small cluster with 2 to 14 nodes P4 (2,4GHz Gigabit
ethernet)
>and I can't explain that....
>I think the response would be the same in the two cases.

Regards

David