LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: rtichy_at_[hidden]
Date: 2006-02-16 10:33:12


Hi again Jeff,

First off, and this is a little late, thank you so much for the help!

I tried the getenv("LAMRANK") idea with a simple little hello world type thing
and sure enough Get_rank was returning 0 for both processes but lamrank was
different (0 and 1). Just to be sure you know what is going on I will post code
and output from the run:

#include <iostream>
#include <cstdlib>
#include "mpi.h"

using namespace std;

int main(int argc, char *argv[]){

  MPI::Init(argc, argv );
  int rank, size;
  const int BUFFER_SIZE = 34;

  size = MPI::COMM_WORLD.Get_size();
  rank = MPI::COMM_WORLD.Get_rank();

  cout << "MPI::COMM_WORLD.Get_size(): " << size << endl;
  cout << "MPI::COMM_WORLD.Get_rank(): " << rank << endl;
  string lamrank(getenv("LAMRANK"));
  cout << "lamrank: " << lamrank << endl;

  if(rank == 0){
    string foo("Hello world from rank 0 to rank 1.");
    MPI::COMM_WORLD.Send(foo.c_str(), foo.length(), MPI::CHAR, 1, 1);
  }
  if(rank == 1){
    char buffer[BUFFER_SIZE];
    MPI::COMM_WORLD.Recv(buffer, BUFFER_SIZE, MPI::CHAR, 0, 1);
    string foo(buffer);
    cout << foo << endl;
  }

  MPI::Finalize();
  return 0;
}

... and the commands I used from starting the lam daemon, compiling to
mpirun.lam:

rtichy_at_darwin:~/mpi/hello_world$ lamboot

LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University

rtichy_at_darwin:~/mpi/hello_world$ /etc/alternatives/mpiCC main.cc -o foo
rtichy_at_darwin:~/mpi/hello_world$ /usr/lib/lam/bin/mpirun.lam -np 2 ./foo
MPI::COMM_WORLD.Get_size(): 1
MPI::COMM_WORLD.Get_rank(): 0
lamrank: 0

0 - MPI_SEND : Invalid rank 1
[0] Aborting program !
[0] Aborting program!
p0_9967: p4_error: : 8262
MPI::COMM_WORLD.Get_size(): 1
MPI::COMM_WORLD.Get_rank(): 0
lamrank: 1

0 - MPI_SEND : Invalid rank 1
[0] Aborting program !
[0] Aborting program!
p0_9968: p4_error: : 8262
-----------------------------------------------------------------------------
It seems that [at least] one of the processes that was started with
mpirun did not invoke MPI_INIT before quitting (it is possible that
more than one process did not invoke MPI_INIT -- mpirun was only
notified of the first one, which was on node n0).

mpirun can *only* be used with MPI programs (i.e., programs that
invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program
to run non-MPI programs over the lambooted nodes.
-----------------------------------------------------------------------------

...so you were right about LAMRANK. What next?

--Rich

Quoting Jeff Squyres <jsquyres_at_[hidden]>:

> On Feb 15, 2006, at 5:08 PM, rtichy_at_[hidden] wrote:
>
> >> Can you verify that you're invoking LAM's mpirun command?
> >
> > I tried using every mpirun command on my machine. Including
> > mpirun.lam in /usr/lib/lam and I still have the same problem, all
> > processes created by lam believe they have rank one...
> > MPI::COMM_WORLD.Get_rank() returns 0. I have always used lam over a
> > network but was told it can be used to debug on a single machine.
> > Is this really the case?
>
> Yes, I run multiple processes on a single machine all the time.
>
> I'm not familiar with your local installation, so I cannot verify
> that /usr/lib/lam/mpirun.lam is the Right mpirun for the LAM
> installation that you're using (it sounds like it, but it depends on
> how your sysadmins set it up).
>
> When you run in the form:
>
> mpirun -np 4 myapp
>
> Then the lamd's should set an environment variable in each process
> that it forks named LAMRANK that indicates that process' rank in
> MPI_COMM_WORLD. Hence, each of the 4 should get different (and
> unique) values. Try calling getenv("LAMRANK") in your application to
> verify this. If you get NULL back, then you're not being launched by
> a LAM daemon, and this is your problem (LAM assumes that it if gets
> NULL back from getenv("LAMRANK") that it's running in "singleton"
> mode, meaning that it wasn't launched via LAM's mpirun and is the
> only process in MPI_COMM_WORLD, and therefore assumes that it is MCW
> rank 0).
>
> If you *are* getting valid (and unique) values from getenv("LAMRANK")
> and MPI::COMM_WORLD.Get_rank() is still returning 0 from all your
> processes, then we need to probe a little deeper to figure out what's
> going on.
>
> --
> {+} Jeff Squyres
> {+} The Open MPI Project
> {+} http://www.open-mpi.org/
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>