LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2003-06-27 07:44:40


On Thu, 26 Jun 2003, Michael Madore wrote:

> As a further data point, I'm seeing similar behavior with the cpi
> example:

Let's tackle cpi since it's a bit simpler than the mandlebrot example.
They sound like the same (or very similar) problem, so let's see what we
can see.

> [mmadore_at_asl156 mmadore]$ mpirun n0-2 cpi
> Process 0 of 3 on master
> 2 points: pi is approximately 3.1623529411764704, error = 0.0207602875866773
> wall clock time = 0.002841
> 3 points: pi is approximately 3.1508492098656031, error = 0.0092565562758100
> wall clock time = 0.000054
> Process 2 of 3 on 1
> Process 1 of 3 on 0
> [mmadore_at_asl156 mmadore]$ mpirun C cpi
> Process 0 of 5 on master
> 2 points: pi is approximately 3.1623529411764704, error = 0.0207602875866773
> wall clock time = 0.001922
> 3 points: pi is approximately 3.1508492098656031, error = 0.0092565562758100
> wall clock time = 0.000063
> Process 4 of 5 on 3
> Process 1 of 5 on 0
> Process 3 of 5 on 2
> Process 2 of 5 on 1

So this is two runs -- using different scheduling -- that both hang at
more or less the same point.

> And the program doesn't go any further. The output from gdb looks
> similar to the Mandelbrot example:
> [snipped]

Can you verify that cpi is still running on all the nodes that it is
supposed to be running?

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/