On Fri, Jan 30, 2004 at 08:23:07AM -0500, Jeff Squyres wrote:
> On Fri, 30 Jan 2004, Charles F. Fisher wrote:
>
> > LAM 6.5.9; LSF 5.1. On HPUX 11.23 if that helps. Are there any
> > specific environment variables, hostname or other information you would
> > like?
>
> Hah! I literally just removed my 6.5.9 CVS checkout yesterday. :-)
>
> After re-checking it out, I think we *may* have not quite had it right in
> 6.5.9 (we don't have any LSF systems ourselves to test with). We're
> specifically looking at the LSB_JOBID and LSB_JOBINDEX environment
> variables; they should be automatically created and populated by LSF
> itself.
>
> Is there any chance that you could give 7.0.4 a whirl?
>
LAM came with a third-party software package; we don't have a source-code
license, but I can ask the vendor if 7.04 is a possibility.
I tried to run a job, using lamboot -dx lam-bhost.def followed by lamnodes,
then mpirun, then lamhalt; here's the output:
Job </home/chuck/test4.lam.csh /scratch/chuck/vviti/AeroSoft/bin-hpux10-lam/gasp --mpi -iR532_Ramp_RS.xml --block 3> was submitted from host <sdx0.uky.edu> by user <chuck>.
Job was executed on host(s) <2*sdx1.uky.edu>, in queue <parashort>, as user <chuck>.
</home/chuck> was used as the home directory.
</scratch/chuck/vviti/Ramp/Jet/VirginiaTechMa40PR532_Ramp_RS> was used as the working directory.
Started at Fri Jan 30 08:48:08 2004
Results reported at Fri Jan 30 08:54:38 2004
Your job looked like:
------------------------------------------------------------
# LSBATCH: User input
/home/chuck/test4.lam.csh /scratch/chuck/vviti/AeroSoft/bin-hpux10-lam/gasp --mpi -iR532_Ramp_RS.xml --block 3
------------------------------------------------------------
Successfully completed.
Resource usage summary:
CPU time : 1.33 sec.
Max Memory : 11 MB
Max Swap : 12 MB
Max Processes : 4
The output (if any) follows:
4
4
LSB_JOBID 11557
LSB_JOBINDEX 0
hboot: process schema = "/scratch/chuck/vviti/AeroSoft/lam-6.5/etc/lam-conf.lam"
hboot: found /scratch/chuck/vviti/AeroSoft/lam-6.5/bin/lamd
hboot: performing tkill
hboot: tkill -b lsf-11557-0
hboot: booting...
hboot: fork /scratch/chuck/vviti/AeroSoft/lam-6.5/bin/lamd
[1] 16299 lamd -x -H 128.163.15.11 -P 61163 -n 0 -o 0 -b lsf-11557-0 -d
LAM 6.5.9/MPI 2 C++/ROMIO - Indiana University
lamboot: boot schema file: lam-bhost.def
lamboot: opening hostfile lam-bhost.def
lamboot: found the following hosts:
lamboot: n0 sdx1.uky.edu
lamboot: resolved hosts:
lamboot: n0 sdx1.uky.edu --> 128.163.15.11
lamboot: found 1 host node(s)
lamboot: origin node is 0 (sdx1.uky.edu)
lamboot: attempting to execute "hboot -t -c lam-conf.lam -d -b lsf-11557-0 -I " -x -H 128.163.15.11 -P 61163 -n 0 -o 0 -b lsf-11557-0 ""
lamboot completed successfully
n0 sdx1.uky.edu:2
-----------------------------------------------------------------------------
It seems that there is no lamd running on this host, which indicates
that the LAM/MPI runtime environment is not operating. The LAM/MPI
runtime environment is necessary for MPI programs to run (the MPI
program tired to invoke the "MPI_Init" function).
Please run the "lamboot" command the start the LAM/MPI runtime
environment. See the LAM/MPI documentation for how to invoke
"lamboot" across multiple machines.
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
It seems that [at least] one of processes that was started with mpirun
did not invoke MPI_INIT before quitting (it is possible that more than
one process did not invoke MPI_INIT -- mpirun was only notified of the
first one, which was on node n0).
mpirun can *only* be used with MPI programs (i.e., programs that
invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program
to run non-MPI programs over the lambooted nodes.
-----------------------------------------------------------------------------
LAM 6.5.9/MPI 2 C++/ROMIO - Indiana University
|