LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2004-06-28 21:41:09


On Mon, 28 Jun 2004, Wang Shuguang wrote:

> > > n0<9279> ssi:boot:base:linear: booting n0 (testmachine1)
> > > n0<9279> ssi:boot:base:linear: finished
> >
> > This is somewhat fishy -- you're booting on testmachine1, but the MPI
> > program reports that you have no lamd running on tmsi-mitl-grid.
> >
> > Are these the same machine?
>
> yes, they are the same machine. i use the different name in the host
> file.

Thinking about this a bit more, a likely cause for this to happen is that
you have an MPI program that was compiled for a different version of LAM.

That is, you lambooted successfully, and mpirun was able to connect to the
lamd successfully (which means that the lamd was running properly), but
then when the MPI executable ran, it couldn't find the lamd. If it was,
indeed, compiled for a different version of LAM, it may well be looking
for the lamd in a different place than where it lives for 7.0.4 (i.e.,
we've changed the name of the session directory where the lamd puts its
named socket over the course of several versions).

Can you double check that your application was compiled for the same
version of LAM that you lambooted?

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/