I've been doing this for a few years without problems - I use LAM and LSF on
HP11 and Linux. You don't mention whether you're using the LSF Parallel
product - I don't.
Actually, I run lamboot from a shell script, and I don't fork - it does
return as soon as LAM is booted. I use
lamboot -x hostfile
I build the LAM app schema from the LSB_MCPU_HOSTS variable. I start the
LAM daemons for each job that runs - I don't want to run the daemons on all
boxes either.
I would suggest looking at why your lamboot doesn't return immediately, and
also looking for the lam daemons on the remote machine as well. Also, I
don't understand why you need the sleep mechanism for lamhalt - I run mpirun
and wait for the job to finish, then run lamhalt. I would take the lamhalt
out of your scripts for testing to be sure you aren't blowing away your
daemons before their time. There's no need for a timer there ( I do sleep a
second after I run the lamhalt and then do some checking before the script
exits - I probably don't need to do that either).
Dave
-----Original Message-----
From: Charles F. Fisher [mailto:chuck_at_[hidden]]
Sent: Thursday, January 22, 2004 10:11 AM
To: lam_at_[hidden]
Subject: LAM: LAM MPI under LSF
I am trying to run a LAM-MPI program under LSF.
Because we have HP's mpi, we are not running the lam-mpi daemons on the
machine normally; this is for one user with software that requires lam-mpi.
Since it's only for one user, we _really_ don't want to run the daemons
except when he submits a job.
I wrote a perl script that forks off a process that runs lamboot (had to
fork because the system call running lamboot doesn't return until lamd
stops) - the main process waits 10 seconds, runs the program, and
then runs lamhalt.
Run interactively, this works with no problems.
Run via LSF, it gets a message saying that there is no lamd running on this
host, tho' lamd is running - I modified the job to do a ps -ef searching for
lamd, and it's there.
Any suggestions on how to get the mpi job to recognize the lamd that's
running?
_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/
|