[snipped some more...]
{jeff squyres wrote:}
There is some initial overhead, but for <10 slave processes on a LAM in an
existing LAM universe, it shouldn't take 20-30 seconds.
IT SEEMS YOU ARE SAYING THAT THE AMOUNT OF OVERHEAD IS CLOSELY TIED WITH THE
NUMBER OF SLAVE PROCESSES. CAN YOU GIVE ME A GENERAL RULE OF THUMB ON THAT?
FOR >10 SLAVE PROCESSES, DOES THE OVERHEAD INCREASE LINEARLY?
HOW ABOUT FOR <10 SLAVE PROCESSES?
3. Can you identify what step/MPI call (exactly) is taking the majority of
the time? Or does the whole sequence just slow down?
YES, IT SEEMS THAT A BIG PORTION OF THE OVERALL EXECUTION TIME FOR MY CODE
IS TIED TO THE ACT OF SYNCHRONIZATION THAT IS EITHER FORCED BY INVOKING A
MPI_BARRIER prior TO CALLING MPI_SEND OR BY INVOKING MPI_SEND (WITHOUT
MPI_BARRIER).
I SUPPOSE THIS MEANS THE SLAVE NODES ARE SIGNIFICANTLY OUT OF SYNC WITH EACH
OTHER; HOWEVER, I CAN NOT UNDERSTAND WHY THIS WOULD BE SINCE EACH SLAVE IS
RUNNING THE SAME CODE ON THE SAME AMOUNT OF DATA. IS THERE A WAY OF FINDING
OUT WHICH NODE IS HOLDING EVERYTHING UP? CAN ANYONE SUGGEST A WAY TO MAKE
THE SLAVES MORE SYNCHRONIZED BEING THAT THEY ALREADY SHOULD BE SINCE THEY'RE
ALL DOING THE SAME WORK?
4. Are you using a shared or a switched network?
I BELIEVE ITS A SWITCHED NETWORK. WHAT DOES THIS MEAN??
thanks,
Anne
|