LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2005-11-19 12:58:47


Sorry for the delay in replying here -- we were totally swamped with SC.

In general, I have not heard good experiences with hyperthreading in
HPC. I'm more of a software person than a hardware person, so I won't
try to delve any deeper than that; you might want to google around to
see what other people's experiences are with hyperthreading processes
that run at/close to 100% of the CPU. In short, based on my
[admittedly limited] experience with hyper threading, I would expect
lower performance for your 32 process/8 node run vs. your 16 process/8
node run.

Depending on your PBS setup, if you asked for nodes=8:ppn=2, unless
they're shared nodes, you should get all 16 processors for exclusive
use. You'll need to consult your local setup/administrator to be sure.

Does that help?

On Nov 15, 2005, at 10:02 PM, Andrey Kharuk wrote:

> Hi David,
>
> Thank you very much for response.
> I used MPI_Barrier after MPI initializing in the each program and
> before first MPI_WTime, which is before Reduce operator or before
> Send/Recv block. And I didn't use it before the last WTime after code
> to measure.
>
> I thought about how to run the program on 16 physical processors. I
> used nodes=8:ppn=2 for my PBS but I'm not sure it used separate
> physical processors for my job. When I use xpbsmon it says
>
> virtual processors 0:1 (cpus=1)
>
> for each node. And I haven't found how to start one process per
> processor.
> However, results are quite better. You can see it there:
>
> http://www.atspec.co.nz/Andrey/Reduce1.htm
>
> Cheers,
> Andrey
>
>
>> Hi Andrey,
>>
>> I have experiment some bad result with HT on Xeon executing 2
>> identical processus on same physical CPU...
>> Have you try running only 16 lamd, one by real physical CPU? If
> not,
>> try it and compare the results before using 32 nodes.
>> Have you try to do a MPI_Barrier just before the first MPI_WTime
> and
>> just before the second MPI_WTime? You could perharps see if it's a
>> synchronisation problem.
>>
>> On other side, I have done some test between send-receive and
>> MPI_gather and MPI_scatter.... using send-receive is faster (5%)
> for
>> me on a small cluster with 2 to 14 nodes P4 (2,4GHz Gigabit
> ethernet)
>> and I can't explain that....
>> I think the response would be the same in the two cases.
>
> Regards
>
> David
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>

-- 
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/