On Thu, 22 May 2003, Pak, Anne O wrote:
> [snipped]
> i ran the simulation for about 20 frames.
>
> the very first frame takes a long time (around 20-30 seconds) to
> execute. (maybe there is some initial start up overhead???)
There is some initial overhead, but for <10 slave processes on a LAM in an
existing LAM universe, it shouldn't take 20-30 seconds.
> for the subsequent frames, it either takes only 0.5 seconds to execute,
> or sometime 18-25 seconds...and these subsequent frames are processing
> the same sets of data (i.e. the updates being sent are the same so
> theoretically, it should take the same time to process).
Can you double check a few things:
1. What nodes are the slaves ending up running on?
2. What is the load on these machines during your runs? (i.e., is anyone
else running big jobs on these nodes during your run such that it causes
your job to run very slowly?)
3. Can you identify what step/MPI call (exactly) is taking the majority of
the time? Or does the whole sequence just slow down?
4. Are you using a shared or a switched network?
5. How big are the updates that you're sending in each subsequent frame
(i.e., total size from matlab->master, and master->each slave)? IIRC, I
think you said you had many MB of data to send initially, but I don't
remember what size you said the size was.
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|