On Feb 7, 2007, at 2:47 PM, Faisal Iqbal wrote:
> > Can you verify that /common/hello is exactly the same executable on
> > both nodes?
> [snipped]
All sounds good.
> > Can you run the /common/hello application just on n1? For example,
> > do the following on each of your two nodes:
> > - login
> > - lamboot
> > - /common/hello (i.e., run it without mpirun)
> > - lamhalt
> > I'm assuming it will work fine on n0 -- the question is whether it
> > will for n1.
> I tried "lamexec C /common/hello" and it worked so this shows that
> it is working on both the PCs.
Well that's just very peculiar. :-\
If you can run them manually, the only reason I can think that LAM's
mpirun would think that they failed is because they were compiled
with some other MPI (e.g., MPICH or some prior version of LAM). But
that's not consistent with what you said earlier -- that you can
mpirun it properly on just one node.
When the error occurs, do you get core dumps? If so, can you get a
stack trace from them to see where exactly it is failing?
What is the exact output of "lamexec C /common/hello"?
--
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems
|