Charles,
I've got it working [after getting the same sympoms you reported: works on
head node(s), but not in batch], by just adding this to my batch script:
unsetenv LSB_JOBID
lamboot...
-Galen Arnold
On Friday 30 January 2004 08:04 am, Charles F. Fisher wrote:
> On Fri, Jan 30, 2004 at 08:23:07AM -0500, Jeff Squyres wrote:
> > On Fri, 30 Jan 2004, Charles F. Fisher wrote:
> > > LAM 6.5.9; LSF 5.1. On HPUX 11.23 if that helps. Are there any
> > > specific environment variables, hostname or other information you would
> > > like?
> >
> > Hah! I literally just removed my 6.5.9 CVS checkout yesterday. :-)
> >
> > After re-checking it out, I think we *may* have not quite had it right in
> > 6.5.9 (we don't have any LSF systems ourselves to test with). We're
> > specifically looking at the LSB_JOBID and LSB_JOBINDEX environment
> > variables; they should be automatically created and populated by LSF
> > itself.
> >
> > Is there any chance that you could give 7.0.4 a whirl?
>
> LAM came with a third-party software package; we don't have a source-code
> license, but I can ask the vendor if 7.04 is a possibility.
>
> I tried to run a job, using lamboot -dx lam-bhost.def followed by lamnodes,
> then mpirun, then lamhalt; here's the output:
>
>
> Job </home/chuck/test4.lam.csh
> /scratch/chuck/vviti/AeroSoft/bin-hpux10-lam/gasp --mpi -iR532_Ramp_RS.xml
> --block 3> was submitted from host <sdx0.uky.edu> by user <chuck>. Job was
> executed on host(s) <2*sdx1.uky.edu>, in queue <parashort>, as user
> <chuck>. </home/chuck> was used as the home directory.
> </scratch/chuck/vviti/Ramp/Jet/VirginiaTechMa40PR532_Ramp_RS> was used as
> the working directory. Started at Fri Jan 30 08:48:08 2004
> Results reported at Fri Jan 30 08:54:38 2004
>
> Your job looked like:
>
> ------------------------------------------------------------
> # LSBATCH: User input
> /home/chuck/test4.lam.csh /scratch/chuck/vviti/AeroSoft/bin-hpux10-lam/gasp
> --mpi -iR532_Ramp_RS.xml --block 3
> ------------------------------------------------------------
>
> Successfully completed.
>
> Resource usage summary:
>
> CPU time : 1.33 sec.
> Max Memory : 11 MB
> Max Swap : 12 MB
>
> Max Processes : 4
>
> The output (if any) follows:
>
> 4
> 4
> LSB_JOBID 11557
> LSB_JOBINDEX 0
> hboot: process schema =
> "/scratch/chuck/vviti/AeroSoft/lam-6.5/etc/lam-conf.lam" hboot: found
> /scratch/chuck/vviti/AeroSoft/lam-6.5/bin/lamd
> hboot: performing tkill
> hboot: tkill -b lsf-11557-0
> hboot: booting...
> hboot: fork /scratch/chuck/vviti/AeroSoft/lam-6.5/bin/lamd
> [1] 16299 lamd -x -H 128.163.15.11 -P 61163 -n 0 -o 0 -b lsf-11557-0 -d
>
> LAM 6.5.9/MPI 2 C++/ROMIO - Indiana University
>
> lamboot: boot schema file: lam-bhost.def
> lamboot: opening hostfile lam-bhost.def
> lamboot: found the following hosts:
> lamboot: n0 sdx1.uky.edu
> lamboot: resolved hosts:
> lamboot: n0 sdx1.uky.edu --> 128.163.15.11
> lamboot: found 1 host node(s)
> lamboot: origin node is 0 (sdx1.uky.edu)
> lamboot: attempting to execute "hboot -t -c lam-conf.lam -d -b lsf-11557-0
> -I " -x -H 128.163.15.11 -P 61163 -n 0 -o 0 -b lsf-11557-0 "" lamboot
> completed successfully
> n0 sdx1.uky.edu:2
> ---------------------------------------------------------------------------
>-- It seems that there is no lamd running on this host, which indicates that
> the LAM/MPI runtime environment is not operating. The LAM/MPI runtime
> environment is necessary for MPI programs to run (the MPI
> program tired to invoke the "MPI_Init" function).
>
> Please run the "lamboot" command the start the LAM/MPI runtime
> environment. See the LAM/MPI documentation for how to invoke
> "lamboot" across multiple machines.
> ---------------------------------------------------------------------------
>--
> ---------------------------------------------------------------------------
>-- It seems that [at least] one of processes that was started with mpirun
> did not invoke MPI_INIT before quitting (it is possible that more than one
> process did not invoke MPI_INIT -- mpirun was only notified of the first
> one, which was on node n0).
>
> mpirun can *only* be used with MPI programs (i.e., programs that
> invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program
> to run non-MPI programs over the lambooted nodes.
> ---------------------------------------------------------------------------
>--
>
> LAM 6.5.9/MPI 2 C++/ROMIO - Indiana University
>
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
--
+
Galen Arnold, consulting group--system engineer
National Center for Supercomputing Applications
605 E. Springfield Avenue (217) 244-3473
Champaign, IL 61820 arnoldg_at_[hidden]
|