So if LAM is booted on both nodes, double check this with the "tping"
command, for example:
tping -c 2 N
And ensure that both nodes can be "seen". Also run lamnodes and verify
that LAM thinks that there is only 1 CPU on each machine (i.e., mpirun
is not trying to run 2 copies of reduc on one machine).
It sounds like mpirun tried to launch 2 copies of reduc -- and as far
as it knows, it *did* launch 2 copies of reduc (probably one on each
node), but the reduc that it found on one machine was not an MPI
process. Specifically, you *should* get a "file not found" error if it
can't find reduc on one machine. So it must be finding it, but perhaps
it's finding the "wrong" reduc (i.e., one that is not an MPI process
and does not call MPI_INIT)...?
On Sep 9, 2004, at 12:30 PM, Pravin R Joshi wrote:
> Hi,
> I am trying to get LAM/MPI 7.0.6 working on two nodes of a cluster
> using
> RedHat Linux 9. I installed a rpm copy on one of the nodes and from
> source on
> another node. Now when I do a lamboot -v hostfile (hostfile has the
> names of
> the two machines) lam is booted on both the nodes, but when I run an
> mpi
> program (eg.: mpirun -np 2 reduc), only one instance of the mpirun is
> started. This one is on the node in which I did a source install. The
> other
> node does not start the mpirun.
> At the end of the mpirun I get the following error.
> -----------------------------------------------------------------------
> -------------------------------------
> It seems that [at least] one of the processes that was started with
> mpirun did not invoke MPI_INIT before quitting (it is possible that
> more than one process did not invoke MPI_INIT -- mpirun was only
> notified of the first one, which was on node n0).
>
> mpirun can *only* be used with MPI programs (i.e., programs that
> invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program
> to run non-MPI programs over the lambooted nodes.
> -----------------------------------------------------------------------
> ------------------------------------
> Can someone help with this please.
> Pravin
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|