On Feb 15, 2006, at 5:08 PM, rtichy_at_[hidden] wrote:
>> Can you verify that you're invoking LAM's mpirun command?
>
> I tried using every mpirun command on my machine. Including
> mpirun.lam in /usr/lib/lam and I still have the same problem, all
> processes created by lam believe they have rank one...
> MPI::COMM_WORLD.Get_rank() returns 0. I have always used lam over a
> network but was told it can be used to debug on a single machine.
> Is this really the case?
Yes, I run multiple processes on a single machine all the time.
I'm not familiar with your local installation, so I cannot verify
that /usr/lib/lam/mpirun.lam is the Right mpirun for the LAM
installation that you're using (it sounds like it, but it depends on
how your sysadmins set it up).
When you run in the form:
mpirun -np 4 myapp
Then the lamd's should set an environment variable in each process
that it forks named LAMRANK that indicates that process' rank in
MPI_COMM_WORLD. Hence, each of the 4 should get different (and
unique) values. Try calling getenv("LAMRANK") in your application to
verify this. If you get NULL back, then you're not being launched by
a LAM daemon, and this is your problem (LAM assumes that it if gets
NULL back from getenv("LAMRANK") that it's running in "singleton"
mode, meaning that it wasn't launched via LAM's mpirun and is the
only process in MPI_COMM_WORLD, and therefore assumes that it is MCW
rank 0).
If you *are* getting valid (and unique) values from getenv("LAMRANK")
and MPI::COMM_WORLD.Get_rank() is still returning 0 from all your
processes, then we need to probe a little deeper to figure out what's
going on.
--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/
|