LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2004-03-24 13:12:22


This is something very similar to what I do -- run any LAM command and
check its return code. I tend to use "lamnodes", but tping is just as
good. Here's what I have in some batch scripts, for example
(csh-flavored; similar concept for Bourne shell):

-----
lamnodes > /dev/null
if ($status != 0) then
    lamboot -v ...
endif
mpirun ...
-----

So I wouldn't check for a particular exit code; those really aren't
published and are subject to change at any time. :-) But if it's not
zero, then it's likely that there's no lamd running.

On Wed, 24 Mar 2004, Neil Storer wrote:

> You could try testing the return code for "215" (the exit code if the
> daemons are not running).
>
> e.g.
> EXIT_CODE=0
> if [ $npart -le 1 ]; then
> program.exe
> else
> tping >/dev/null 2>&1 || EXIT_CODE=$?
> if [ $EXIT_CODE -eq 215 ] ; then
> lamboot -v
> fi
> mpiexec -machinefile ... -n 1 master.exe <config
> fi
>
>
> "tping" is part of the lam-mpi distribution. My system is down at present, so I
> can't check this part.
>
> The "|| EXIT_CODE=$?" construct should allow the job to carry on even if the
> user is using a "set -e" to abort on errors.
>
> Regards
> Neil
>

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/