As Jeff mentioned earlier this might be caused by using other MPI
implementation for mpirun.
I had the exact same problem a month ago and I found out that my PATH
variable was pointing to the MPICH version of mpirun. I changed my path so
that it first pointed to the LAM/MPI directory and that fixed the problem.
What is the output in the command prompt if you just type mpirun?
Esteban Fiallos
Data Mining Research Laboratory
Louisiana Tech University
http://dmrl.latech.edu/
----- Original Message -----
From: <rtichy_at_[hidden]>
To: "General LAM/MPI mailing list" <lam_at_[hidden]>
Sent: Thursday, February 16, 2006 9:33 AM
Subject: Re: LAM: trouble testing mpi on one processor
> Hi again Jeff,
>
> First off, and this is a little late, thank you so much for the help!
>
> I tried the getenv("LAMRANK") idea with a simple little hello world type
> thing
> and sure enough Get_rank was returning 0 for both processes but lamrank
> was
> different (0 and 1). Just to be sure you know what is going on I will post
> code
> and output from the run:
>
> #include <iostream>
> #include <cstdlib>
> #include "mpi.h"
>
> using namespace std;
>
> int main(int argc, char *argv[]){
>
> MPI::Init(argc, argv );
> int rank, size;
> const int BUFFER_SIZE = 34;
>
> size = MPI::COMM_WORLD.Get_size();
> rank = MPI::COMM_WORLD.Get_rank();
>
> cout << "MPI::COMM_WORLD.Get_size(): " << size << endl;
> cout << "MPI::COMM_WORLD.Get_rank(): " << rank << endl;
> string lamrank(getenv("LAMRANK"));
> cout << "lamrank: " << lamrank << endl;
>
>
> if(rank == 0){
> string foo("Hello world from rank 0 to rank 1.");
> MPI::COMM_WORLD.Send(foo.c_str(), foo.length(), MPI::CHAR, 1, 1);
> }
> if(rank == 1){
> char buffer[BUFFER_SIZE];
> MPI::COMM_WORLD.Recv(buffer, BUFFER_SIZE, MPI::CHAR, 0, 1);
> string foo(buffer);
> cout << foo << endl;
> }
>
> MPI::Finalize();
> return 0;
> }
>
> ... and the commands I used from starting the lam daemon, compiling to
> mpirun.lam:
>
> rtichy_at_darwin:~/mpi/hello_world$ lamboot
>
> LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University
>
> rtichy_at_darwin:~/mpi/hello_world$ /etc/alternatives/mpiCC main.cc -o foo
> rtichy_at_darwin:~/mpi/hello_world$ /usr/lib/lam/bin/mpirun.lam -np 2 ./foo
> MPI::COMM_WORLD.Get_size(): 1
> MPI::COMM_WORLD.Get_rank(): 0
> lamrank: 0
>
> 0 - MPI_SEND : Invalid rank 1
> [0] Aborting program !
> [0] Aborting program!
> p0_9967: p4_error: : 8262
> MPI::COMM_WORLD.Get_size(): 1
> MPI::COMM_WORLD.Get_rank(): 0
> lamrank: 1
>
> 0 - MPI_SEND : Invalid rank 1
> [0] Aborting program !
> [0] Aborting program!
> p0_9968: p4_error: : 8262
> -----------------------------------------------------------------------------
> It seems that [at least] one of the processes that was started with
> mpirun did not invoke MPI_INIT before quitting (it is possible that
> more than one process did not invoke MPI_INIT -- mpirun was only
> notified of the first one, which was on node n0).
>
> mpirun can *only* be used with MPI programs (i.e., programs that
> invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program
> to run non-MPI programs over the lambooted nodes.
> -----------------------------------------------------------------------------
>
> ...so you were right about LAMRANK. What next?
>
> --Rich
>
>
> Quoting Jeff Squyres <jsquyres_at_[hidden]>:
>
>> On Feb 15, 2006, at 5:08 PM, rtichy_at_[hidden] wrote:
>>
>> >> Can you verify that you're invoking LAM's mpirun command?
>> >
>> > I tried using every mpirun command on my machine. Including
>> > mpirun.lam in /usr/lib/lam and I still have the same problem, all
>> > processes created by lam believe they have rank one...
>> > MPI::COMM_WORLD.Get_rank() returns 0. I have always used lam over a
>> > network but was told it can be used to debug on a single machine.
>> > Is this really the case?
>>
>> Yes, I run multiple processes on a single machine all the time.
>>
>> I'm not familiar with your local installation, so I cannot verify
>> that /usr/lib/lam/mpirun.lam is the Right mpirun for the LAM
>> installation that you're using (it sounds like it, but it depends on
>> how your sysadmins set it up).
>>
>> When you run in the form:
>>
>> mpirun -np 4 myapp
>>
>> Then the lamd's should set an environment variable in each process
>> that it forks named LAMRANK that indicates that process' rank in
>> MPI_COMM_WORLD. Hence, each of the 4 should get different (and
>> unique) values. Try calling getenv("LAMRANK") in your application to
>> verify this. If you get NULL back, then you're not being launched by
>> a LAM daemon, and this is your problem (LAM assumes that it if gets
>> NULL back from getenv("LAMRANK") that it's running in "singleton"
>> mode, meaning that it wasn't launched via LAM's mpirun and is the
>> only process in MPI_COMM_WORLD, and therefore assumes that it is MCW
>> rank 0).
>>
>> If you *are* getting valid (and unique) values from getenv("LAMRANK")
>> and MPI::COMM_WORLD.Get_rank() is still returning 0 from all your
>> processes, then we need to probe a little deeper to figure out what's
>> going on.
>>
>> --
>> {+} Jeff Squyres
>> {+} The Open MPI Project
>> {+} http://www.open-mpi.org/
>>
>>
>> _______________________________________________
>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
>
>
>
>
>
|