Hi again Jeff,
First off, and this is a little late, thank you so much for the help!
I tried the getenv("LAMRANK") idea with a simple little hello world type thing
and sure enough Get_rank was returning 0 for both processes but lamrank was
different (0 and 1). Just to be sure you know what is going on I will post code
and output from the run:
#include <iostream>
#include <cstdlib>
#include "mpi.h"
using namespace std;
int main(int argc, char *argv[]){
MPI::Init(argc, argv );
int rank, size;
const int BUFFER_SIZE = 34;
size = MPI::COMM_WORLD.Get_size();
rank = MPI::COMM_WORLD.Get_rank();
cout << "MPI::COMM_WORLD.Get_size(): " << size << endl;
cout << "MPI::COMM_WORLD.Get_rank(): " << rank << endl;
string lamrank(getenv("LAMRANK"));
cout << "lamrank: " << lamrank << endl;
if(rank == 0){
string foo("Hello world from rank 0 to rank 1.");
MPI::COMM_WORLD.Send(foo.c_str(), foo.length(), MPI::CHAR, 1, 1);
}
if(rank == 1){
char buffer[BUFFER_SIZE];
MPI::COMM_WORLD.Recv(buffer, BUFFER_SIZE, MPI::CHAR, 0, 1);
string foo(buffer);
cout << foo << endl;
}
MPI::Finalize();
return 0;
}
... and the commands I used from starting the lam daemon, compiling to
mpirun.lam:
rtichy_at_darwin:~/mpi/hello_world$ lamboot
LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University
rtichy_at_darwin:~/mpi/hello_world$ /etc/alternatives/mpiCC main.cc -o foo
rtichy_at_darwin:~/mpi/hello_world$ /usr/lib/lam/bin/mpirun.lam -np 2 ./foo
MPI::COMM_WORLD.Get_size(): 1
MPI::COMM_WORLD.Get_rank(): 0
lamrank: 0
0 - MPI_SEND : Invalid rank 1
[0] Aborting program !
[0] Aborting program!
p0_9967: p4_error: : 8262
MPI::COMM_WORLD.Get_size(): 1
MPI::COMM_WORLD.Get_rank(): 0
lamrank: 1
0 - MPI_SEND : Invalid rank 1
[0] Aborting program !
[0] Aborting program!
p0_9968: p4_error: : 8262
-----------------------------------------------------------------------------
It seems that [at least] one of the processes that was started with
mpirun did not invoke MPI_INIT before quitting (it is possible that
more than one process did not invoke MPI_INIT -- mpirun was only
notified of the first one, which was on node n0).
mpirun can *only* be used with MPI programs (i.e., programs that
invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program
to run non-MPI programs over the lambooted nodes.
-----------------------------------------------------------------------------
...so you were right about LAMRANK. What next?
--Rich
Quoting Jeff Squyres <jsquyres_at_[hidden]>:
> On Feb 15, 2006, at 5:08 PM, rtichy_at_[hidden] wrote:
>
> >> Can you verify that you're invoking LAM's mpirun command?
> >
> > I tried using every mpirun command on my machine. Including
> > mpirun.lam in /usr/lib/lam and I still have the same problem, all
> > processes created by lam believe they have rank one...
> > MPI::COMM_WORLD.Get_rank() returns 0. I have always used lam over a
> > network but was told it can be used to debug on a single machine.
> > Is this really the case?
>
> Yes, I run multiple processes on a single machine all the time.
>
> I'm not familiar with your local installation, so I cannot verify
> that /usr/lib/lam/mpirun.lam is the Right mpirun for the LAM
> installation that you're using (it sounds like it, but it depends on
> how your sysadmins set it up).
>
> When you run in the form:
>
> mpirun -np 4 myapp
>
> Then the lamd's should set an environment variable in each process
> that it forks named LAMRANK that indicates that process' rank in
> MPI_COMM_WORLD. Hence, each of the 4 should get different (and
> unique) values. Try calling getenv("LAMRANK") in your application to
> verify this. If you get NULL back, then you're not being launched by
> a LAM daemon, and this is your problem (LAM assumes that it if gets
> NULL back from getenv("LAMRANK") that it's running in "singleton"
> mode, meaning that it wasn't launched via LAM's mpirun and is the
> only process in MPI_COMM_WORLD, and therefore assumes that it is MCW
> rank 0).
>
> If you *are* getting valid (and unique) values from getenv("LAMRANK")
> and MPI::COMM_WORLD.Get_rank() is still returning 0 from all your
> processes, then we need to probe a little deeper to figure out what's
> going on.
>
> --
> {+} Jeff Squyres
> {+} The Open MPI Project
> {+} http://www.open-mpi.org/
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
|