On Thu, 26 Jun 2003, Michael Madore wrote:
> As a further data point, I'm seeing similar behavior with the cpi
> example:
Let's tackle cpi since it's a bit simpler than the mandlebrot example.
They sound like the same (or very similar) problem, so let's see what we
can see.
> [mmadore_at_asl156 mmadore]$ mpirun n0-2 cpi
> Process 0 of 3 on master
> 2 points: pi is approximately 3.1623529411764704, error = 0.0207602875866773
> wall clock time = 0.002841
> 3 points: pi is approximately 3.1508492098656031, error = 0.0092565562758100
> wall clock time = 0.000054
> Process 2 of 3 on 1
> Process 1 of 3 on 0
> [mmadore_at_asl156 mmadore]$ mpirun C cpi
> Process 0 of 5 on master
> 2 points: pi is approximately 3.1623529411764704, error = 0.0207602875866773
> wall clock time = 0.001922
> 3 points: pi is approximately 3.1508492098656031, error = 0.0092565562758100
> wall clock time = 0.000063
> Process 4 of 5 on 3
> Process 1 of 5 on 0
> Process 3 of 5 on 2
> Process 2 of 5 on 1
So this is two runs -- using different scheduling -- that both hang at
more or less the same point.
> And the program doesn't go any further. The output from gdb looks
> similar to the Mandelbrot example:
> [snipped]
Can you verify that cpi is still running on all the nodes that it is
supposed to be running?
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|