This is something very similar to what I do -- run any LAM command and
check its return code. I tend to use "lamnodes", but tping is just as
good. Here's what I have in some batch scripts, for example
(csh-flavored; similar concept for Bourne shell):
-----
lamnodes > /dev/null
if ($status != 0) then
lamboot -v ...
endif
mpirun ...
-----
So I wouldn't check for a particular exit code; those really aren't
published and are subject to change at any time. :-) But if it's not
zero, then it's likely that there's no lamd running.
On Wed, 24 Mar 2004, Neil Storer wrote:
> You could try testing the return code for "215" (the exit code if the
> daemons are not running).
>
> e.g.
> EXIT_CODE=0
> if [ $npart -le 1 ]; then
> program.exe
> else
> tping >/dev/null 2>&1 || EXIT_CODE=$?
> if [ $EXIT_CODE -eq 215 ] ; then
> lamboot -v
> fi
> mpiexec -machinefile ... -n 1 master.exe <config
> fi
>
>
> "tping" is part of the lam-mpi distribution. My system is down at present, so I
> can't check this part.
>
> The "|| EXIT_CODE=$?" construct should allow the job to carry on even if the
> user is using a "set -e" to abort on errors.
>
> Regards
> Neil
>
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|