LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Pravin R Joshi (pravenj_at_[hidden])
Date: 2004-09-09 14:12:16


After recompiling the code I got rid of the error message that I was getting.
Now the program runs properly, but only on one node. the second node still
does not show an mpirun when I do a ps -e for it. Also the first nodes only
shows 1 instance of mpirun with ps -e. The command that i use mpirun -np 2
reduc.
Thanks.
Pravin

On Thursday 09 September 2004 12:31, Jeff Squyres wrote:
> So if LAM is booted on both nodes, double check this with the "tping"
> command, for example:
>
> tping -c 2 N
>
> And ensure that both nodes can be "seen". Also run lamnodes and verify
> that LAM thinks that there is only 1 CPU on each machine (i.e., mpirun
> is not trying to run 2 copies of reduc on one machine).
>
> It sounds like mpirun tried to launch 2 copies of reduc -- and as far
> as it knows, it *did* launch 2 copies of reduc (probably one on each
> node), but the reduc that it found on one machine was not an MPI
> process. Specifically, you *should* get a "file not found" error if it
> can't find reduc on one machine. So it must be finding it, but perhaps
> it's finding the "wrong" reduc (i.e., one that is not an MPI process
> and does not call MPI_INIT)...?
>
> On Sep 9, 2004, at 12:30 PM, Pravin R Joshi wrote:
> > Hi,
> > I am trying to get LAM/MPI 7.0.6 working on two nodes of a cluster
> > using
> > RedHat Linux 9. I installed a rpm copy on one of the nodes and from
> > source on
> > another node. Now when I do a lamboot -v hostfile (hostfile has the
> > names of
> > the two machines) lam is booted on both the nodes, but when I run an
> > mpi
> > program (eg.: mpirun -np 2 reduc), only one instance of the mpirun is
> > started. This one is on the node in which I did a source install. The
> > other
> > node does not start the mpirun.
> > At the end of the mpirun I get the following error.
> > -----------------------------------------------------------------------
> > -------------------------------------
> > It seems that [at least] one of the processes that was started with
> > mpirun did not invoke MPI_INIT before quitting (it is possible that
> > more than one process did not invoke MPI_INIT -- mpirun was only
> > notified of the first one, which was on node n0).
> >
> > mpirun can *only* be used with MPI programs (i.e., programs that
> > invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program
> > to run non-MPI programs over the lambooted nodes.
> > -----------------------------------------------------------------------
> > ------------------------------------
> > Can someone help with this please.
> > Pravin
> >
> > _______________________________________________
> > This list is archived at http://www.lam-mpi.org/MailArchives/lam/