On Feb 6, 2007, at 1:10 AM, Faisal Iqbal wrote:
> When trying mpirun n0 /common/hello it worked and gave correct
> output. [n0 is my head node]
Good.
> However when trying to expand it to 2 pcs, i got the same error [It
> seems that at least one of the processes didn't invoke
> MPI_INIT.....] Here is the command i tried
> mpirun n0-1 /common/hello
> mpirun n0,1 /common/hello
>
> Below is the output for this
> ----------------------------------------------------------------------
> -------
> It seems that [at least] one of the processes that was started with
Here's some additional questions in no particular order:
- Can you verify that /common/hello is exactly the same executable on
both nodes?
- Can you verify that you have exactly the same version of LAM/MPI
installed on both nodes?
- Are both of your machines the same hardware and running the same OS
(distro, version, etc.)?
- Can you run the /common/hello application just on n1? For example,
do the following on each of your two nodes:
- login
- lamboot
- /common/hello (i.e., run it without mpirun)
- lamhalt
I'm assuming it will work fine on n0 -- the question is whether it
will for n1.
- If the above doesn't work, do you get core dumps of the hello
application on n1? If so, can you get a stack trace of where it is
dying?
--
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems
|