LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-02-06 08:38:24


On Feb 6, 2007, at 1:10 AM, Faisal Iqbal wrote:

> When trying mpirun n0 /common/hello it worked and gave correct
> output. [n0 is my head node]

Good.

> However when trying to expand it to 2 pcs, i got the same error [It
> seems that at least one of the processes didn't invoke
> MPI_INIT.....] Here is the command i tried
> mpirun n0-1 /common/hello
> mpirun n0,1 /common/hello
>
> Below is the output for this
> ----------------------------------------------------------------------
> -------
> It seems that [at least] one of the processes that was started with

Here's some additional questions in no particular order:

- Can you verify that /common/hello is exactly the same executable on
both nodes?

- Can you verify that you have exactly the same version of LAM/MPI
installed on both nodes?

- Are both of your machines the same hardware and running the same OS
(distro, version, etc.)?

- Can you run the /common/hello application just on n1? For example,
do the following on each of your two nodes:
   - login
   - lamboot
   - /common/hello (i.e., run it without mpirun)
   - lamhalt
   I'm assuming it will work fine on n0 -- the question is whether it
will for n1.

- If the above doesn't work, do you get core dumps of the hello
application on n1? If so, can you get a stack trace of where it is
dying?

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems