Please see my earlier post -- it looks like you are compiling with
MPICH and running with LAM.
http://www.lam-mpi.org/MailArchives/lam/2006/02/11913.php
On Feb 16, 2006, at 3:55 PM, rtichy_at_[hidden] wrote:
> Hi,
>
> I used the mpirun command from /usr/bin and then mpirun.lam in
> /usr/lib/lam/bin,
> also I installed this version of mpich and lam from adept (kde
> package manager)
> using kubuntu linux. It was intalled more or less automatically and
> on my home
> machine. If you think it necessary I could uninstall both the lam
> and mpich
> versions downloaded with adept and recompile from the source on the
> web-site.
>
> rtichy_at_darwin:~/mpi/platypus_mercer$ /usr/lib/lam/bin/mpirun.lam
> ----------------------------------------------------------------------
> -------
> Synopsis: mpirun [options] <app>
> mpirun [options] <where> <program> [<prog args>]
>
> Description: Start an MPI application in LAM/MPI.
>
> Notes:
> [options] Zero or more of the options listed
> below
> <app> LAM/MPI appschema
> <where> List of LAM nodes and/or CPUs
> (examples
> below)
> <program> Must be a LAM/MPI program that either
> invokes MPI_INIT or has exactly one of
> its children invoke MPI_INIT
> <prog args> Optional list of command line
> arguments
> to <program>
>
> Options:
> -c <num> Run <num> copies of <program> (same
> as -np)
> -c2c Use fast library (C2C) mode
> -client <rank> <host>:<port>
> Run IMPI job; connect to the IMPI
> server <host>
> at port <port> as IMPI client
> number <rank>
> -D Change current working directory of
> new
> processes to the directory where the
> executable resides
> -f Do not open stdio descriptors
> -ger Turn on GER mode
> -h Print this help message
> -l Force line-buffered output
> -lamd Use LAM daemon (LAMD) mode
> (opposite of -c2c)
> -nger Turn off GER mode
> -np <num> Run <num> copies of <program> (same
> as -c)
> -nx Don't export LAM_MPI_* environment
> variables
> -O Universe is homogeneous
> -pty / -npty Use/don't use pseudo terminals when
> stdout is
> a tty
> -s <nodeid> Load <program> from node <nodeid>
> -sigs / -nsigs Catch/don't catch signals in MPI
> application
> -ssi <n> <arg> Set environment variable
> LAM_MPI_SSI_<n>=<arg>
> -toff Enable tracing with generation
> initially off
> -ton, -t Enable tracing with generation
> initially on
> -tv Launch processes under TotalView
> Debugger
> -v Be verbose
> -w / -nw Wait/don't wait for application to
> complete
> -wd <dir> Change current working directory of
> new
> processes to <dir>
> -x <envlist> Export environment vars in <envlist>
>
> Nodes: n<list>, e.g., n0-3,5
> CPUS: c<list>, e.g., c0-3,5
> Extras: h (local node), o (origin node), N (all nodes), C
> (all CPUs)
>
> Examples: mpirun n0-7 prog1
> Executes "prog1" on nodes 0 through 7.
>
> mpirun -lamd -x FOO=bar,DISPLAY N prog2
> Executes "prog2" on all nodes using the LAMD RPI.
> In the environment of each process, set FOO to the
> value
> "bar", and set DISPLAY to the current value.
>
> mpirun n0 N prog3
> Run "prog3" on node 0, *and* all nodes. This
> executes *2*
> copies on n0.
>
> mpirun C prog4 arg1 arg2
> Run "prog4" on each available CPU with command line
> arguments of "arg1" and "arg2". If each node has a
> CPU count of 1, the "C" is equivalent to "N". If at
> least one node has a CPU count greater than 1, LAM
> will run neighboring ranks of MPI_COMM_WORLD on that
> node. For example, if node 0 has a CPU count of 4 and
> node 1 has a CPU count of 2, "prog4" will have
> MPI_COMM_WORLD ranks 0 through 3 on n0, and ranks 4
> and 5 on n1.
>
> mpirun c0 C prog5
> Similar to the "prog3" example above, this runs
> "prog5"
> on CPU 0 *and* on each available CPU. This executes
> *2* copies on the node where CPU 0 is (i.e., n0).
> This is probably not a useful use of the "C" notation;
> it is only shown here for an example.
>
> Defaults: -c2c -w -pty -nger -nsigs
> ----------------------------------------------------------------------
> -------
> rtichy_at_darwin:~/mpi/platypus_mercer$ mpirun
> ----------------------------------------------------------------------
> -------
> Synopsis: mpirun [options] <app>
> mpirun [options] <where> <program> [<prog args>]
>
> Description: Start an MPI application in LAM/MPI.
>
> Notes:
> [options] Zero or more of the options listed
> below
> <app> LAM/MPI appschema
> <where> List of LAM nodes and/or CPUs
> (examples
> below)
> <program> Must be a LAM/MPI program that either
> invokes MPI_INIT or has exactly one of
> its children invoke MPI_INIT
> <prog args> Optional list of command line
> arguments
> to <program>
>
> Options:
> -c <num> Run <num> copies of <program> (same
> as -np)
> -c2c Use fast library (C2C) mode
> -client <rank> <host>:<port>
> Run IMPI job; connect to the IMPI
> server <host>
> at port <port> as IMPI client
> number <rank>
> -D Change current working directory of
> new
> processes to the directory where the
> executable resides
> -f Do not open stdio descriptors
> -ger Turn on GER mode
> -h Print this help message
> -l Force line-buffered output
> -lamd Use LAM daemon (LAMD) mode
> (opposite of -c2c)
> -nger Turn off GER mode
> -np <num> Run <num> copies of <program> (same
> as -c)
> -nx Don't export LAM_MPI_* environment
> variables
> -O Universe is homogeneous
> -pty / -npty Use/don't use pseudo terminals when
> stdout is
> a tty
> -s <nodeid> Load <program> from node <nodeid>
> -sigs / -nsigs Catch/don't catch signals in MPI
> application
> -ssi <n> <arg> Set environment variable
> LAM_MPI_SSI_<n>=<arg>
> -toff Enable tracing with generation
> initially off
> -ton, -t Enable tracing with generation
> initially on
> -tv Launch processes under TotalView
> Debugger
> -v Be verbose
> -w / -nw Wait/don't wait for application to
> complete
> -wd <dir> Change current working directory of
> new
> processes to <dir>
> -x <envlist> Export environment vars in <envlist>
>
> Nodes: n<list>, e.g., n0-3,5
> CPUS: c<list>, e.g., c0-3,5
> Extras: h (local node), o (origin node), N (all nodes), C
> (all CPUs)
>
> Examples: mpirun n0-7 prog1
> Executes "prog1" on nodes 0 through 7.
>
> mpirun -lamd -x FOO=bar,DISPLAY N prog2
> Executes "prog2" on all nodes using the LAMD RPI.
> In the environment of each process, set FOO to the
> value
> "bar", and set DISPLAY to the current value.
>
> mpirun n0 N prog3
> Run "prog3" on node 0, *and* all nodes. This
> executes *2*
> copies on n0.
>
> mpirun C prog4 arg1 arg2
> Run "prog4" on each available CPU with command line
> arguments of "arg1" and "arg2". If each node has a
> CPU count of 1, the "C" is equivalent to "N". If at
> least one node has a CPU count greater than 1, LAM
> will run neighboring ranks of MPI_COMM_WORLD on that
> node. For example, if node 0 has a CPU count of 4 and
> node 1 has a CPU count of 2, "prog4" will have
> MPI_COMM_WORLD ranks 0 through 3 on n0, and ranks 4
> and 5 on n1.
>
> mpirun c0 C prog5
> Similar to the "prog3" example above, this runs
> "prog5"
> on CPU 0 *and* on each available CPU. This executes
> *2* copies on the node where CPU 0 is (i.e., n0).
> This is probably not a useful use of the "C" notation;
> it is only shown here for an example.
>
> Defaults: -c2c -w -pty -nger -nsigs
> ----------------------------------------------------------------------
> ------
>
> They both look the same to me.
>
> -- Rich
>
>
> Quoting Esteban Fiallos <erf008_at_[hidden]>:
>
>> As Jeff mentioned earlier this might be caused by using other MPI
>> implementation for mpirun.
>>
>> I had the exact same problem a month ago and I found out that my PATH
>> variable was pointing to the MPICH version of mpirun. I changed my
>> path so
>> that it first pointed to the LAM/MPI directory and that fixed the
>> problem.
>>
>> What is the output in the command prompt if you just type mpirun?
>>
>> Esteban Fiallos
>> Data Mining Research Laboratory
>> Louisiana Tech University
>> http://dmrl.latech.edu/
>>
>> ----- Original Message -----
>> From: <rtichy_at_[hidden]>
>> To: "General LAM/MPI mailing list" <lam_at_[hidden]>
>> Sent: Thursday, February 16, 2006 9:33 AM
>> Subject: Re: LAM: trouble testing mpi on one processor
>>
>>
>>> Hi again Jeff,
>>>
>>> First off, and this is a little late, thank you so much for the
>>> help!
>>>
>>> I tried the getenv("LAMRANK") idea with a simple little hello
>>> world type
>>> thing
>>> and sure enough Get_rank was returning 0 for both processes but
>>> lamrank
>>> was
>>> different (0 and 1). Just to be sure you know what is going on I
>>> will post
>>
>>> code
>>> and output from the run:
>>>
>>> #include <iostream>
>>> #include <cstdlib>
>>> #include "mpi.h"
>>>
>>> using namespace std;
>>>
>>> int main(int argc, char *argv[]){
>>>
>>> MPI::Init(argc, argv );
>>> int rank, size;
>>> const int BUFFER_SIZE = 34;
>>>
>>> size = MPI::COMM_WORLD.Get_size();
>>> rank = MPI::COMM_WORLD.Get_rank();
>>>
>>> cout << "MPI::COMM_WORLD.Get_size(): " << size << endl;
>>> cout << "MPI::COMM_WORLD.Get_rank(): " << rank << endl;
>>> string lamrank(getenv("LAMRANK"));
>>> cout << "lamrank: " << lamrank << endl;
>>>
>>>
>>> if(rank == 0){
>>> string foo("Hello world from rank 0 to rank 1.");
>>> MPI::COMM_WORLD.Send(foo.c_str(), foo.length(), MPI::CHAR, 1, 1);
>>> }
>>> if(rank == 1){
>>> char buffer[BUFFER_SIZE];
>>> MPI::COMM_WORLD.Recv(buffer, BUFFER_SIZE, MPI::CHAR, 0, 1);
>>> string foo(buffer);
>>> cout << foo << endl;
>>> }
>>>
>>> MPI::Finalize();
>>> return 0;
>>> }
>>>
>>> ... and the commands I used from starting the lam daemon,
>>> compiling to
>>> mpirun.lam:
>>>
>>> rtichy_at_darwin:~/mpi/hello_world$ lamboot
>>>
>>> LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University
>>>
>>> rtichy_at_darwin:~/mpi/hello_world$ /etc/alternatives/mpiCC main.cc -
>>> o foo
>>> rtichy_at_darwin:~/mpi/hello_world$ /usr/lib/lam/bin/mpirun.lam -np
>>> 2 ./foo
>>> MPI::COMM_WORLD.Get_size(): 1
>>> MPI::COMM_WORLD.Get_rank(): 0
>>> lamrank: 0
>>>
>>> 0 - MPI_SEND : Invalid rank 1
>>> [0] Aborting program !
>>> [0] Aborting program!
>>> p0_9967: p4_error: : 8262
>>> MPI::COMM_WORLD.Get_size(): 1
>>> MPI::COMM_WORLD.Get_rank(): 0
>>> lamrank: 1
>>>
>>> 0 - MPI_SEND : Invalid rank 1
>>> [0] Aborting program !
>>> [0] Aborting program!
>>> p0_9968: p4_error: : 8262
>>>
>> ---------------------------------------------------------------------
>> --------
>>> It seems that [at least] one of the processes that was started with
>>> mpirun did not invoke MPI_INIT before quitting (it is possible that
>>> more than one process did not invoke MPI_INIT -- mpirun was only
>>> notified of the first one, which was on node n0).
>>>
>>> mpirun can *only* be used with MPI programs (i.e., programs that
>>> invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec"
>>> program
>>> to run non-MPI programs over the lambooted nodes.
>>>
>> ---------------------------------------------------------------------
>> --------
>>>
>>> ...so you were right about LAMRANK. What next?
>>>
>>> --Rich
>>>
>>>
>>> Quoting Jeff Squyres <jsquyres_at_[hidden]>:
>>>
>>>> On Feb 15, 2006, at 5:08 PM, rtichy_at_[hidden] wrote:
>>>>
>>>>>> Can you verify that you're invoking LAM's mpirun command?
>>>>>
>>>>> I tried using every mpirun command on my machine. Including
>>>>> mpirun.lam in /usr/lib/lam and I still have the same problem, all
>>>>> processes created by lam believe they have rank one...
>>>>> MPI::COMM_WORLD.Get_rank() returns 0. I have always used lam
>>>>> over a
>>>>> network but was told it can be used to debug on a single machine.
>>>>> Is this really the case?
>>>>
>>>> Yes, I run multiple processes on a single machine all the time.
>>>>
>>>> I'm not familiar with your local installation, so I cannot verify
>>>> that /usr/lib/lam/mpirun.lam is the Right mpirun for the LAM
>>>> installation that you're using (it sounds like it, but it
>>>> depends on
>>>> how your sysadmins set it up).
>>>>
>>>> When you run in the form:
>>>>
>>>> mpirun -np 4 myapp
>>>>
>>>> Then the lamd's should set an environment variable in each process
>>>> that it forks named LAMRANK that indicates that process' rank in
>>>> MPI_COMM_WORLD. Hence, each of the 4 should get different (and
>>>> unique) values. Try calling getenv("LAMRANK") in your
>>>> application to
>>>> verify this. If you get NULL back, then you're not being
>>>> launched by
>>>> a LAM daemon, and this is your problem (LAM assumes that it if gets
>>>> NULL back from getenv("LAMRANK") that it's running in "singleton"
>>>> mode, meaning that it wasn't launched via LAM's mpirun and is the
>>>> only process in MPI_COMM_WORLD, and therefore assumes that it is
>>>> MCW
>>>> rank 0).
>>>>
>>>> If you *are* getting valid (and unique) values from getenv
>>>> ("LAMRANK")
>>>> and MPI::COMM_WORLD.Get_rank() is still returning 0 from all your
>>>> processes, then we need to probe a little deeper to figure out
>>>> what's
>>>> going on.
>>>>
>>>> --
>>>> {+} Jeff Squyres
>>>> {+} The Open MPI Project
>>>> {+} http://www.open-mpi.org/
>>>>
>>>>
>>>> _______________________________________________
>>>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>>>
>>>
>>>
>>>
>>>
>>>
>>
>> _______________________________________________
>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
>
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/
|