LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Faisal Iqbal (faisal749_at_[hidden])
Date: 2007-02-10 09:18:06


The output for
 
 mpiexec n0 /common/hello
 -----------------------------------------------------------------------------
 It seems that [at least] one of the processes that was started with
 mpirun did not invoke MPI_INIT before quitting (it is possible that
 more than one process did not invoke MPI_INIT -- mpirun was only
 notified of the first one, which was on node n0).
 
 mpirun can *only* be used with MPI programs (i.e., programs that
 invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program
 to run non-MPI programs over the lambooted nodes.
 -----------------------------------------------------------------------------
 mpirun failed with exit status 252
 
 The output is correct only for head node, for all other nodes we get the aforementioned error.
 
 Faisal

Jeff Squyres <jsquyres_at_[hidden]> wrote: On Feb 7, 2007, at 2:47 PM, Faisal Iqbal wrote:

> > Can you verify that /common/hello is exactly the same executable on
> > both nodes?
> [snipped]

All sounds good.

> > Can you run the /common/hello application just on n1? For example,
> > do the following on each of your two nodes:
> > - login
> > - lamboot
> > - /common/hello (i.e., run it without mpirun)
> > - lamhalt
> > I'm assuming it will work fine on n0 -- the question is whether it
> > will for n1.
> I tried "lamexec C /common/hello" and it worked so this shows that
> it is working on both the PCs.

Well that's just very peculiar. :-\

If you can run them manually, the only reason I can think that LAM's
mpirun would think that they failed is because they were compiled
with some other MPI (e.g., MPICH or some prior version of LAM). But
that's not consistent with what you said earlier -- that you can
mpirun it properly on just one node.

When the error occurs, do you get core dumps? If so, can you get a
stack trace from them to see where exactly it is failing?

What is the exact output of "lamexec C /common/hello"?

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems
_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/
 
---------------------------------
Expecting? Get great news right away with email Auto-Check.
Try the Yahoo! Mail Beta.