LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: rtichy_at_[hidden]
Date: 2006-02-16 17:34:09


Hi again,

Thanks for the hints.

I removed mpich and lam and reinstalled just lam, and it is now working. Each
process has a different rank corresponding to its lamrank. But... lam did not
install with c++ exception support as the kde package manager (adept) installed
it automatically... Can I enable exception support now or do I need to
reconfigure and recompile... which I would have to do manually.

Again, thanks for all the tips.

-- Rich

Quoting Jeff Squyres <jsquyres_at_[hidden]>:

> Please see my earlier post -- it looks like you are compiling with
> MPICH and running with LAM.
>
> http://www.lam-mpi.org/MailArchives/lam/2006/02/11913.php
>
>
> On Feb 16, 2006, at 3:55 PM, rtichy_at_[hidden] wrote:
>
> > Hi,
> >
> > I used the mpirun command from /usr/bin and then mpirun.lam in
> > /usr/lib/lam/bin,
> > also I installed this version of mpich and lam from adept (kde
> > package manager)
> > using kubuntu linux. It was intalled more or less automatically and
> > on my home
> > machine. If you think it necessary I could uninstall both the lam
> > and mpich
> > versions downloaded with adept and recompile from the source on the
> > web-site.
> >
> > rtichy_at_darwin:~/mpi/platypus_mercer$ /usr/lib/lam/bin/mpirun.lam
> > ----------------------------------------------------------------------
> > -------
> > Synopsis: mpirun [options] <app>
> > mpirun [options] <where> <program> [<prog args>]
> >
> > Description: Start an MPI application in LAM/MPI.
> >
> > Notes:
> > [options] Zero or more of the options listed
> > below
> > <app> LAM/MPI appschema
> > <where> List of LAM nodes and/or CPUs
> > (examples
> > below)
> > <program> Must be a LAM/MPI program that either
> > invokes MPI_INIT or has exactly one of
> > its children invoke MPI_INIT
> > <prog args> Optional list of command line
> > arguments
> > to <program>
> >
> > Options:
> > -c <num> Run <num> copies of <program> (same
> > as -np)
> > -c2c Use fast library (C2C) mode
> > -client <rank> <host>:<port>
> > Run IMPI job; connect to the IMPI
> > server <host>
> > at port <port> as IMPI client
> > number <rank>
> > -D Change current working directory of
> > new
> > processes to the directory where the
> > executable resides
> > -f Do not open stdio descriptors
> > -ger Turn on GER mode
> > -h Print this help message
> > -l Force line-buffered output
> > -lamd Use LAM daemon (LAMD) mode
> > (opposite of -c2c)
> > -nger Turn off GER mode
> > -np <num> Run <num> copies of <program> (same
> > as -c)
> > -nx Don't export LAM_MPI_* environment
> > variables
> > -O Universe is homogeneous
> > -pty / -npty Use/don't use pseudo terminals when
> > stdout is
> > a tty
> > -s <nodeid> Load <program> from node <nodeid>
> > -sigs / -nsigs Catch/don't catch signals in MPI
> > application
> > -ssi <n> <arg> Set environment variable
> > LAM_MPI_SSI_<n>=<arg>
> > -toff Enable tracing with generation
> > initially off
> > -ton, -t Enable tracing with generation
> > initially on
> > -tv Launch processes under TotalView
> > Debugger
> > -v Be verbose
> > -w / -nw Wait/don't wait for application to
> > complete
> > -wd <dir> Change current working directory of
> > new
> > processes to <dir>
> > -x <envlist> Export environment vars in <envlist>
> >
> > Nodes: n<list>, e.g., n0-3,5
> > CPUS: c<list>, e.g., c0-3,5
> > Extras: h (local node), o (origin node), N (all nodes), C
> > (all CPUs)
> >
> > Examples: mpirun n0-7 prog1
> > Executes "prog1" on nodes 0 through 7.
> >
> > mpirun -lamd -x FOO=bar,DISPLAY N prog2
> > Executes "prog2" on all nodes using the LAMD RPI.
> > In the environment of each process, set FOO to the
> > value
> > "bar", and set DISPLAY to the current value.
> >
> > mpirun n0 N prog3
> > Run "prog3" on node 0, *and* all nodes. This
> > executes *2*
> > copies on n0.
> >
> > mpirun C prog4 arg1 arg2
> > Run "prog4" on each available CPU with command line
> > arguments of "arg1" and "arg2". If each node has a
> > CPU count of 1, the "C" is equivalent to "N". If at
> > least one node has a CPU count greater than 1, LAM
> > will run neighboring ranks of MPI_COMM_WORLD on that
> > node. For example, if node 0 has a CPU count of 4 and
> > node 1 has a CPU count of 2, "prog4" will have
> > MPI_COMM_WORLD ranks 0 through 3 on n0, and ranks 4
> > and 5 on n1.
> >
> > mpirun c0 C prog5
> > Similar to the "prog3" example above, this runs
> > "prog5"
> > on CPU 0 *and* on each available CPU. This executes
> > *2* copies on the node where CPU 0 is (i.e., n0).
> > This is probably not a useful use of the "C" notation;
> > it is only shown here for an example.
> >
> > Defaults: -c2c -w -pty -nger -nsigs
> > ----------------------------------------------------------------------
> > -------
> > rtichy_at_darwin:~/mpi/platypus_mercer$ mpirun
> > ----------------------------------------------------------------------
> > -------
> > Synopsis: mpirun [options] <app>
> > mpirun [options] <where> <program> [<prog args>]
> >
> > Description: Start an MPI application in LAM/MPI.
> >
> > Notes:
> > [options] Zero or more of the options listed
> > below
> > <app> LAM/MPI appschema
> > <where> List of LAM nodes and/or CPUs
> > (examples
> > below)
> > <program> Must be a LAM/MPI program that either
> > invokes MPI_INIT or has exactly one of
> > its children invoke MPI_INIT
> > <prog args> Optional list of command line
> > arguments
> > to <program>
> >
> > Options:
> > -c <num> Run <num> copies of <program> (same
> > as -np)
> > -c2c Use fast library (C2C) mode
> > -client <rank> <host>:<port>
> > Run IMPI job; connect to the IMPI
> > server <host>
> > at port <port> as IMPI client
> > number <rank>
> > -D Change current working directory of
> > new
> > processes to the directory where the
> > executable resides
> > -f Do not open stdio descriptors
> > -ger Turn on GER mode
> > -h Print this help message
> > -l Force line-buffered output
> > -lamd Use LAM daemon (LAMD) mode
> > (opposite of -c2c)
> > -nger Turn off GER mode
> > -np <num> Run <num> copies of <program> (same
> > as -c)
> > -nx Don't export LAM_MPI_* environment
> > variables
> > -O Universe is homogeneous
> > -pty / -npty Use/don't use pseudo terminals when
> > stdout is
> > a tty
> > -s <nodeid> Load <program> from node <nodeid>
> > -sigs / -nsigs Catch/don't catch signals in MPI
> > application
> > -ssi <n> <arg> Set environment variable
> > LAM_MPI_SSI_<n>=<arg>
> > -toff Enable tracing with generation
> > initially off
> > -ton, -t Enable tracing with generation
> > initially on
> > -tv Launch processes under TotalView
> > Debugger
> > -v Be verbose
> > -w / -nw Wait/don't wait for application to
> > complete
> > -wd <dir> Change current working directory of
> > new
> > processes to <dir>
> > -x <envlist> Export environment vars in <envlist>
> >
> > Nodes: n<list>, e.g., n0-3,5
> > CPUS: c<list>, e.g., c0-3,5
> > Extras: h (local node), o (origin node), N (all nodes), C
> > (all CPUs)
> >
> > Examples: mpirun n0-7 prog1
> > Executes "prog1" on nodes 0 through 7.
> >
> > mpirun -lamd -x FOO=bar,DISPLAY N prog2
> > Executes "prog2" on all nodes using the LAMD RPI.
> > In the environment of each process, set FOO to the
> > value
> > "bar", and set DISPLAY to the current value.
> >
> > mpirun n0 N prog3
> > Run "prog3" on node 0, *and* all nodes. This
> > executes *2*
> > copies on n0.
> >
> > mpirun C prog4 arg1 arg2
> > Run "prog4" on each available CPU with command line
> > arguments of "arg1" and "arg2". If each node has a
> > CPU count of 1, the "C" is equivalent to "N". If at
> > least one node has a CPU count greater than 1, LAM
> > will run neighboring ranks of MPI_COMM_WORLD on that
> > node. For example, if node 0 has a CPU count of 4 and
> > node 1 has a CPU count of 2, "prog4" will have
> > MPI_COMM_WORLD ranks 0 through 3 on n0, and ranks 4
> > and 5 on n1.
> >
> > mpirun c0 C prog5
> > Similar to the "prog3" example above, this runs
> > "prog5"
> > on CPU 0 *and* on each available CPU. This executes
> > *2* copies on the node where CPU 0 is (i.e., n0).
> > This is probably not a useful use of the "C" notation;
> > it is only shown here for an example.
> >
> > Defaults: -c2c -w -pty -nger -nsigs
> > ----------------------------------------------------------------------
> > ------
> >
> > They both look the same to me.
> >
> > -- Rich
> >
> >
> > Quoting Esteban Fiallos <erf008_at_[hidden]>:
> >
> >> As Jeff mentioned earlier this might be caused by using other MPI
> >> implementation for mpirun.
> >>
> >> I had the exact same problem a month ago and I found out that my PATH
> >> variable was pointing to the MPICH version of mpirun. I changed my
> >> path so
> >> that it first pointed to the LAM/MPI directory and that fixed the
> >> problem.
> >>
> >> What is the output in the command prompt if you just type mpirun?
> >>
> >> Esteban Fiallos
> >> Data Mining Research Laboratory
> >> Louisiana Tech University
> >> http://dmrl.latech.edu/
> >>
> >> ----- Original Message -----
> >> From: <rtichy_at_[hidden]>
> >> To: "General LAM/MPI mailing list" <lam_at_[hidden]>
> >> Sent: Thursday, February 16, 2006 9:33 AM
> >> Subject: Re: LAM: trouble testing mpi on one processor
> >>
> >>
> >>> Hi again Jeff,
> >>>
> >>> First off, and this is a little late, thank you so much for the
> >>> help!
> >>>
> >>> I tried the getenv("LAMRANK") idea with a simple little hello
> >>> world type
> >>> thing
> >>> and sure enough Get_rank was returning 0 for both processes but
> >>> lamrank
> >>> was
> >>> different (0 and 1). Just to be sure you know what is going on I
> >>> will post
> >>
> >>> code
> >>> and output from the run:
> >>>
> >>> #include <iostream>
> >>> #include <cstdlib>
> >>> #include "mpi.h"
> >>>
> >>> using namespace std;
> >>>
> >>> int main(int argc, char *argv[]){
> >>>
> >>> MPI::Init(argc, argv );
> >>> int rank, size;
> >>> const int BUFFER_SIZE = 34;
> >>>
> >>> size = MPI::COMM_WORLD.Get_size();
> >>> rank = MPI::COMM_WORLD.Get_rank();
> >>>
> >>> cout << "MPI::COMM_WORLD.Get_size(): " << size << endl;
> >>> cout << "MPI::COMM_WORLD.Get_rank(): " << rank << endl;
> >>> string lamrank(getenv("LAMRANK"));
> >>> cout << "lamrank: " << lamrank << endl;
> >>>
> >>>
> >>> if(rank == 0){
> >>> string foo("Hello world from rank 0 to rank 1.");
> >>> MPI::COMM_WORLD.Send(foo.c_str(), foo.length(), MPI::CHAR, 1, 1);
> >>> }
> >>> if(rank == 1){
> >>> char buffer[BUFFER_SIZE];
> >>> MPI::COMM_WORLD.Recv(buffer, BUFFER_SIZE, MPI::CHAR, 0, 1);
> >>> string foo(buffer);
> >>> cout << foo << endl;
> >>> }
> >>>
> >>> MPI::Finalize();
> >>> return 0;
> >>> }
> >>>
> >>> ... and the commands I used from starting the lam daemon,
> >>> compiling to
> >>> mpirun.lam:
> >>>
> >>> rtichy_at_darwin:~/mpi/hello_world$ lamboot
> >>>
> >>> LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University
> >>>
> >>> rtichy_at_darwin:~/mpi/hello_world$ /etc/alternatives/mpiCC main.cc -
> >>> o foo
> >>> rtichy_at_darwin:~/mpi/hello_world$ /usr/lib/lam/bin/mpirun.lam -np
> >>> 2 ./foo
> >>> MPI::COMM_WORLD.Get_size(): 1
> >>> MPI::COMM_WORLD.Get_rank(): 0
> >>> lamrank: 0
> >>>
> >>> 0 - MPI_SEND : Invalid rank 1
> >>> [0] Aborting program !
> >>> [0] Aborting program!
> >>> p0_9967: p4_error: : 8262
> >>> MPI::COMM_WORLD.Get_size(): 1
> >>> MPI::COMM_WORLD.Get_rank(): 0
> >>> lamrank: 1
> >>>
> >>> 0 - MPI_SEND : Invalid rank 1
> >>> [0] Aborting program !
> >>> [0] Aborting program!
> >>> p0_9968: p4_error: : 8262
> >>>
> >> ---------------------------------------------------------------------
> >> --------
> >>> It seems that [at least] one of the processes that was started with
> >>> mpirun did not invoke MPI_INIT before quitting (it is possible that
> >>> more than one process did not invoke MPI_INIT -- mpirun was only
> >>> notified of the first one, which was on node n0).
> >>>
> >>> mpirun can *only* be used with MPI programs (i.e., programs that
> >>> invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec"
> >>> program
> >>> to run non-MPI programs over the lambooted nodes.
> >>>
> >> ---------------------------------------------------------------------
> >> --------
> >>>
> >>> ...so you were right about LAMRANK. What next?
> >>>
> >>> --Rich
> >>>
> >>>
> >>> Quoting Jeff Squyres <jsquyres_at_[hidden]>:
> >>>
> >>>> On Feb 15, 2006, at 5:08 PM, rtichy_at_[hidden] wrote:
> >>>>
> >>>>>> Can you verify that you're invoking LAM's mpirun command?
> >>>>>
> >>>>> I tried using every mpirun command on my machine. Including
> >>>>> mpirun.lam in /usr/lib/lam and I still have the same problem, all
> >>>>> processes created by lam believe they have rank one...
> >>>>> MPI::COMM_WORLD.Get_rank() returns 0. I have always used lam
> >>>>> over a
> >>>>> network but was told it can be used to debug on a single machine.
> >>>>> Is this really the case?
> >>>>
> >>>> Yes, I run multiple processes on a single machine all the time.
> >>>>
> >>>> I'm not familiar with your local installation, so I cannot verify
> >>>> that /usr/lib/lam/mpirun.lam is the Right mpirun for the LAM
> >>>> installation that you're using (it sounds like it, but it
> >>>> depends on
> >>>> how your sysadmins set it up).
> >>>>
> >>>> When you run in the form:
> >>>>
> >>>> mpirun -np 4 myapp
> >>>>
> >>>> Then the lamd's should set an environment variable in each process
> >>>> that it forks named LAMRANK that indicates that process' rank in
> >>>> MPI_COMM_WORLD. Hence, each of the 4 should get different (and
> >>>> unique) values. Try calling getenv("LAMRANK") in your
> >>>> application to
> >>>> verify this. If you get NULL back, then you're not being
> >>>> launched by
> >>>> a LAM daemon, and this is your problem (LAM assumes that it if gets
> >>>> NULL back from getenv("LAMRANK") that it's running in "singleton"
> >>>> mode, meaning that it wasn't launched via LAM's mpirun and is the
> >>>> only process in MPI_COMM_WORLD, and therefore assumes that it is
> >>>> MCW
> >>>> rank 0).
> >>>>
> >>>> If you *are* getting valid (and unique) values from getenv
> >>>> ("LAMRANK")
> >>>> and MPI::COMM_WORLD.Get_rank() is still returning 0 from all your
> >>>> processes, then we need to probe a little deeper to figure out
> >>>> what's
> >>>> going on.
> >>>>
> >>>> --
> >>>> {+} Jeff Squyres
> >>>> {+} The Open MPI Project
> >>>> {+} http://www.open-mpi.org/
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> >>>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >> _______________________________________________
> >> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> >>
> >
> >
> >
> > _______________________________________________
> > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
>
> --
> {+} Jeff Squyres
> {+} The Open MPI Project
> {+} http://www.open-mpi.org/
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>