LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-09-11 08:51:02


On Sep 7, 2007, at 3:27 PM, pat.o'bryant_at_[hidden] wrote:

> We have a group of users that uses "lamexec" along with an
> application
> schema file to execute MPMD jobs. Since we upgraded to "lam-7.1.2.8",
> "lamexec" fails with the messages shown below. Interesting, the
> users guide
> for "lam 7.1.2" has the following text:
>
> "The lamexec command is similar to mpirun but is used for non-MPI
> programs
> "
> So, the question is this: "What version(s) of LAM support
> "lamexec"? Our
> earlier version of LAM, "lam-6.5.8-4", worked just fine using
> "lamexec".

Well that's quite odd -- lamexec should work for *all* versions of LAM.

> Code that generated Error Messages
> *********************************************
> .......
> lamboot -v /tmp/lam_boot.$PBS_JOBID

Note that in the 7.x series, you shouldn't need the boot schema file
(indeed, it's ignored). LAM will directly obtain the list of hosts
to use from PBS/Torque.

> lamexec -w -v schema1

What is the contents of the schema1 file?

FWIW: I just downloaded and installed LAM 7.1.4 and lamexec seemed to
work for me:

-----
[5:49] svbu-mpi:/home/jsquyres/lam-7.1.4 % lamboot

LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University

[5:49] svbu-mpi:/home/jsquyres/lam-7.1.4 % lamexec N hostname
svbu-mpi.cisco.com
[5:49] svbu-mpi:/home/jsquyres/lam-7.1.4 % cat schema
N hostname
[5:50] svbu-mpi:/home/jsquyres/lam-7.1.4 % lamexec -w -v schema
6442 hostname running on n0 (o)
svbu-mpi.cisco.com
[5:50] svbu-mpi:/home/jsquyres/lam-7.1.4 %
-----

> Error Messages
> *********************
> n-1<24782> ssi:boot:base:linear: booting n0 (xxxxxxxxxx)
> n-1<24782> ssi:boot:base:linear: booting n1 (yyyyyyyyy)
> n-1<24782> ssi:boot:base:linear: finished
> ----------------------------------------------------------------------
> -------
> It seems that [at least] one of the processes that was started with
> mpirun did not invoke MPI_INIT before quitting (it is possible that
> more than one process did not invoke MPI_INIT -- mpirun was only
> notified of the first one, which was on node n0).
>
> mpirun can *only* be used with MPI programs (i.e., programs that
> invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program
> to run non-MPI programs over the lambooted nodes.
> ----------------------------------------------------------------------
> -------
>
>
> J.W. (Pat) O'Bryant,Jr.
> Business Line Infrastructure
> Technical Systems, HPC
> Office: 713-431-7022
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/

-- 
Jeff Squyres
Cisco Systems