I am trying to run a LAM-MPI program under LSF.
Because we have HP's mpi, we are not running the lam-mpi daemons on the
machine normally; this is for one user with software that requires lam-mpi.
Since it's only for one user, we _really_ don't want to run the daemons
except when he submits a job.
I wrote a perl script that forks off a process that runs lamboot (had to
fork because the system call running lamboot doesn't return until lamd
stops) - the main process waits 10 seconds, runs the program, and
then runs lamhalt.
Run interactively, this works with no problems.
Run via LSF, it gets a message saying that there is no lamd running on this
host, tho' lamd is running - I modified the job to do a ps -ef searching for
lamd, and it's there.
Any suggestions on how to get the mpi job to recognize the lamd that's
running?
|