LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Pak, Anne O (anne.o.pak_at_[hidden])
Date: 2003-05-27 18:24:41


[snipped some more...]
{jeff squyres wrote:}

There is some initial overhead, but for <10 slave processes on a LAM in an
existing LAM universe, it shouldn't take 20-30 seconds.

IT SEEMS YOU ARE SAYING THAT THE AMOUNT OF OVERHEAD IS CLOSELY TIED WITH THE
NUMBER OF SLAVE PROCESSES. CAN YOU GIVE ME A GENERAL RULE OF THUMB ON THAT?
FOR >10 SLAVE PROCESSES, DOES THE OVERHEAD INCREASE LINEARLY?
HOW ABOUT FOR <10 SLAVE PROCESSES?

3. Can you identify what step/MPI call (exactly) is taking the majority of
the time? Or does the whole sequence just slow down?

YES, IT SEEMS THAT A BIG PORTION OF THE OVERALL EXECUTION TIME FOR MY CODE
IS TIED TO THE ACT OF SYNCHRONIZATION THAT IS EITHER FORCED BY INVOKING A
MPI_BARRIER prior TO CALLING MPI_SEND OR BY INVOKING MPI_SEND (WITHOUT
MPI_BARRIER).

I SUPPOSE THIS MEANS THE SLAVE NODES ARE SIGNIFICANTLY OUT OF SYNC WITH EACH
OTHER; HOWEVER, I CAN NOT UNDERSTAND WHY THIS WOULD BE SINCE EACH SLAVE IS
RUNNING THE SAME CODE ON THE SAME AMOUNT OF DATA. IS THERE A WAY OF FINDING
OUT WHICH NODE IS HOLDING EVERYTHING UP? CAN ANYONE SUGGEST A WAY TO MAKE
THE SLAVES MORE SYNCHRONIZED BEING THAT THEY ALREADY SHOULD BE SINCE THEY'RE
ALL DOING THE SAME WORK?

4. Are you using a shared or a switched network?

I BELIEVE ITS A SWITCHED NETWORK. WHAT DOES THIS MEAN??

thanks,

Anne