LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Timothy I Mattox (tmattox_at_[hidden])
Date: 2004-04-09 20:50:11


Hello,
While testing the new Warewulf 2.1, I discovered that the LAM 7.0.4
test suite can incorrectly "PASS" a test even though it has failed.
Specifically, if a dynamic library is missing on some ranks other than
rank 0, tests won't sucesfully run, but will say "PASS" anyway...

Here is a snippet of output from a lamtests "make -k check", with rank 0
having a full linux install, and rank 1 having a stripped down linux,
which is missing by accident a dynamic C++ library:

mpirun -x TEST -s h C -ssi rpi usysv
/home/users/tmattox/lamtests-7.0.4/info/./00_create_cxx
/home/users/tmattox/lamtests-7.0.4/info/./00_create_cxx: error while
loading shared libraries: libstdc++-libc6.2-2.so.3: cannot open shared
object file: No such file or directory
-----------------------------------------------------------------------------
It seems that [at least] one of the processes that was started with
mpirun did not invoke MPI_INIT before quitting (it is possible that
more than one process did not invoke MPI_INIT -- mpirun was only
notified of the first one, which was on node n0).

mpirun can *only* be used with MPI programs (i.e., programs that
invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program
to run non-MPI programs over the lambooted nodes.
-----------------------------------------------------------------------------
PASS: 00_create_cxx

Hopefully it won't be very hard to check for this failure mode.
Unfortunatly, I'm too busy at the moment to find a solution myself (other
than to install the missing library on my nodes ;-)

-- 
Tim Mattox - tmattox_at_[hidden] - http://homepage.mac.com/tmattox/
    http://aggregate.org/KAOS/ - http://advogato.org/person/tmattox/