On Sep 7, 2007, at 3:27 PM, pat.o'bryant_at_[hidden] wrote:
> We have a group of users that uses "lamexec" along with an
> application
> schema file to execute MPMD jobs. Since we upgraded to "lam-7.1.2.8",
> "lamexec" fails with the messages shown below. Interesting, the
> users guide
> for "lam 7.1.2" has the following text:
>
> "The lamexec command is similar to mpirun but is used for non-MPI
> programs
> "
> So, the question is this: "What version(s) of LAM support
> "lamexec"? Our
> earlier version of LAM, "lam-6.5.8-4", worked just fine using
> "lamexec".
Well that's quite odd -- lamexec should work for *all* versions of LAM.
> Code that generated Error Messages
> *********************************************
> .......
> lamboot -v /tmp/lam_boot.$PBS_JOBID
Note that in the 7.x series, you shouldn't need the boot schema file
(indeed, it's ignored). LAM will directly obtain the list of hosts
to use from PBS/Torque.
> lamexec -w -v schema1
What is the contents of the schema1 file?
FWIW: I just downloaded and installed LAM 7.1.4 and lamexec seemed to
work for me:
-----
[5:49] svbu-mpi:/home/jsquyres/lam-7.1.4 % lamboot
LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University
[5:49] svbu-mpi:/home/jsquyres/lam-7.1.4 % lamexec N hostname
svbu-mpi.cisco.com
[5:49] svbu-mpi:/home/jsquyres/lam-7.1.4 % cat schema
N hostname
[5:50] svbu-mpi:/home/jsquyres/lam-7.1.4 % lamexec -w -v schema
6442 hostname running on n0 (o)
svbu-mpi.cisco.com
[5:50] svbu-mpi:/home/jsquyres/lam-7.1.4 %
-----
> Error Messages
> *********************
> n-1<24782> ssi:boot:base:linear: booting n0 (xxxxxxxxxx)
> n-1<24782> ssi:boot:base:linear: booting n1 (yyyyyyyyy)
> n-1<24782> ssi:boot:base:linear: finished
> ----------------------------------------------------------------------
> -------
> It seems that [at least] one of the processes that was started with
> mpirun did not invoke MPI_INIT before quitting (it is possible that
> more than one process did not invoke MPI_INIT -- mpirun was only
> notified of the first one, which was on node n0).
>
> mpirun can *only* be used with MPI programs (i.e., programs that
> invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program
> to run non-MPI programs over the lambooted nodes.
> ----------------------------------------------------------------------
> -------
>
>
> J.W. (Pat) O'Bryant,Jr.
> Business Line Infrastructure
> Technical Systems, HPC
> Office: 713-431-7022
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
--
Jeff Squyres
Cisco Systems
|