LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Ricardo Pereira (ricardo.np_at_[hidden])
Date: 2004-10-05 13:58:43


Hi,

I've just found out that the error I reported earlier was my fault.
The application had been compiled with another version of lam-mpi. I'm
really sorry I bothered you all.

Ricardo

On Tue, 5 Oct 2004 09:31:46 -0400, Jeff Squyres <jsquyres_at_[hidden]> wrote:
> On Oct 5, 2004, at 9:22 AM, Ricardo Nishikido Pereira wrote:
>
> > I'm trying to run mpi applications in a hetereogeneous cluster, with
> > pc-linux and macintoshes nodes.
> >
> > I can lamboot correctly, but when I attempt to run an application I
> > get an error saying that linux nodes are using tcp while mac nodes are
> > using usysv:
> >
> > MPI_COMM_WORLD rank 0: tcp (v7.0.0)
> > MPI_COMM_WORLD rank 9: usysv (v7.1.0)
>
> This is unfortunately a known problem -- LAM does not do well
> coordinating when there are different modules available on different
> nodes (or, more specifically, when one module would be better than
> another on a given node).
>
> > Then, I try to invoke mpirun telling it to use tcp and it says:
>
> You correctly deduced the answer: adding -ssi rpi tcp to the mpirun
> command line will force all ranks to use tcp. You could also -ssi rpi
> usysv, since usysv also uses TCP for off-node communication.
>
> > MPI_COMM_WORLD rank 0: tcp (v7.0.0)
> > MPI_COMM_WORLD rank 9: tcp (v7.1.0)
> >
> > I've installed lam-7.1.1 in all nodes, so I don't know why there are
> > different versions of tcp. When I run programs only in the mac nodes
> > or only in the linux nodes everything is fine.
>
> Double check your paths when running non-interactive jobs on these
> nodes. Somehow its finding an older TCP module -- perhaps a prior LAM
> installation? For example, compare the output of:
>
> ssh otherhost
> laminfo
>
> (i.e., an interactive login) vs. the following:
>
> ssh otherhost laminfo
>
> Check the path shown in the output of laminfo as well as the version
> numbers of the modules.
>
> --
> {+} Jeff Squyres
> {+} jsquyres_at_[hidden]
> {+} http://www.lam-mpi.org/
>
>