LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Faisal Iqbal (faisal749_at_[hidden])
Date: 2007-02-07 14:47:02


Dear Jeff
> Can you verify that /common/hello is exactly the same executable on
> both nodes?
yes it is same executable on both PCs. Actually its an nfs share and mounted and working fine :)

> Can you verify that you have exactly the same version of LAM/MPI
> installed on both nodes?
Yes the version is same on both the nodes and so are all the paths for LAM. I'm sure about it as i had that in mind. Its ver 7.1.2

> Are both of your machines the same hardware and running the same OS
> (distro, version, etc.)?
Both PCs are exactly same [same brand/model] with exactly same specs and same OS which is Fedora Core 04 on both.

> Can you run the /common/hello application just on n1? For example,
> do the following on each of your two nodes:
> - login
> - lamboot
> - /common/hello (i.e., run it without mpirun)
> - lamhalt
> I'm assuming it will work fine on n0 -- the question is whether it
> will for n1.
I tried "lamexec C /common/hello" and it worked so this shows that it is working on both the PCs.

Do you have anything else in mind? I wish i'm not making some very stupid mistake!

Regards,
Faisal

Jeff Squyres <jsquyres_at_[hidden]> wrote: On Feb 6, 2007, at 1:10 AM, Faisal Iqbal wrote:

> When trying mpirun n0 /common/hello it worked and gave correct
> output. [n0 is my head node]

Good.

> However when trying to expand it to 2 pcs, i got the same error [It
> seems that at least one of the processes didn't invoke
> MPI_INIT.....] Here is the command i tried
> mpirun n0-1 /common/hello
> mpirun n0,1 /common/hello
>
> Below is the output for this
> ----------------------------------------------------------------------
> -------
> It seems that [at least] one of the processes that was started with

Here's some additional questions in no particular order:

- Can you verify that /common/hello is exactly the same executable on
both nodes?

- Can you verify that you have exactly the same version of LAM/MPI
installed on both nodes?

- Are both of your machines the same hardware and running the same OS
(distro, version, etc.)?

- Can you run the /common/hello application just on n1? For example,
do the following on each of your two nodes:
   - login
   - lamboot
   - /common/hello (i.e., run it without mpirun)
   - lamhalt
   I'm assuming it will work fine on n0 -- the question is whether it
will for n1.

- If the above doesn't work, do you get core dumps of the hello
application on n1? If so, can you get a stack trace of where it is
dying?

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems
_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/
 
 
---------------------------------
Expecting? Get great news right away with email Auto-Check.
Try the Yahoo! Mail Beta.