Hello all,
I have just installed a small testbed for our cluster consisting of only
2 computers. I have installed clustermatic 5 on top of Fedora Core 3.
Booting and bpsh'ing works fine, but I have some trouble getting MPI
programs to work with LAM/MPI.
I found some postings on the archives but no clues how to solve them.
The following issue concerns LAM/MPI 7.1.1-2 and the latest SVN snapshot
(7.2b1r10023). I compiled both from scratch using gcc/g77 and
gcc/nagware. I can lamboot without any problems using the bproc ssi boot
module and tping reports that it can find all computers (master and 1
node).
Then I try to start one of the examples contained in the LAM/MPI distro,
e.g. the pi one. As soon as I start "mpirun n0-1
PATH_TO_LAM/example/fpi", I get the following message on the nodes
console:
"bproc: WARNING: bproc/move.c: 1886: send_recv_process needs to be
reworked to be consistent with the rest of the move code" And on the
master the mpirun program reports:
"-----------------------------------------------------------------------
------
It seems that [at least] one of the processes that was started with
mpirun did not invoke MPI_INIT before quitting (it is possible that more
than one process did not invoke MPI_INIT -- mpirun was only notified of
the first one, which was on node n0).
mpirun can *only* be used with MPI programs (i.e., programs that invoke
MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program to run
non-MPI programs over the lambooted nodes.
------------------------------------------------------------------------
-----"
Does anybody experienced similar problems or has a tip how to verify
that my setup is basically ok? Help would be very appreciated. Thanks in
advance.
Alex
|