Hi,
You could try testing the return code for "215" (the exit code if the daemons
are not running).
e.g.
EXIT_CODE=0
if [ $npart -le 1 ]; then
program.exe
else
tping >/dev/null 2>&1 || EXIT_CODE=$?
if [ $EXIT_CODE -eq 215 ] ; then
lamboot -v
fi
mpiexec -machinefile ... -n 1 master.exe <config
fi
"tping" is part of the lam-mpi distribution. My system is down at present, so I
can't check this part.
The "|| EXIT_CODE=$?" construct should allow the job to carry on even if the
user is using a "set -e" to abort on errors.
Regards
Neil
--
+-----------------+---------------------------------+------------------+
| Neil Storer | Head: Systems S/W Section | Operations Dept. |
+-----------------+---------------------------------+------------------+
| ECMWF, | email: neil.storer_at_[hidden] | //=\\ //=\\ |
| Shinfield Park, | Tel: (+44 118) 9499353 | // \\// \\ |
| Reading, | (+44 118) 9499000 x 2353 | ECMWF |
| Berkshire, | Fax: (+44 118) 9869450 | ECMWF |
| RG2 9AX, | | \\ //\\ // |
| UK | URL: http://www.ecmwf.int/ | \\=// \\=// |
+--+--------------+---------------------------------+----------------+-+
| ECMWF is the European Centre for Medium-Range Weather Forecasts |
+-----------------------------------------------------------------+
|