Hi,
I verified the installations and they seem to be ok; when executing
laminfo (in both ways mentioned, through interactive and
non-interactive ssh, and in all nodes) the folowing is displayed:
LAM/MPI: 7.1.1
...
SSI rpi: tcp (API v1.0, Module v7.1)
SSI rpi: usysv (API v1.0, Module v7.1)
...
By the way, how can I verify which version of tcp/usysv my
applications are using, in execution time?
Thanks,
Ricardo
On Tue, 5 Oct 2004 09:31:46 -0400, Jeff Squyres <jsquyres_at_[hidden]> wrote:
> On Oct 5, 2004, at 9:22 AM, Ricardo Nishikido Pereira wrote:
>
> > I'm trying to run mpi applications in a hetereogeneous cluster, with
> > pc-linux and macintoshes nodes.
> >
> > I can lamboot correctly, but when I attempt to run an application I
> > get an error saying that linux nodes are using tcp while mac nodes are
> > using usysv:
> >
> > MPI_COMM_WORLD rank 0: tcp (v7.0.0)
> > MPI_COMM_WORLD rank 9: usysv (v7.1.0)
>
> This is unfortunately a known problem -- LAM does not do well
> coordinating when there are different modules available on different
> nodes (or, more specifically, when one module would be better than
> another on a given node).
>
> > Then, I try to invoke mpirun telling it to use tcp and it says:
>
> You correctly deduced the answer: adding -ssi rpi tcp to the mpirun
> command line will force all ranks to use tcp. You could also -ssi rpi
> usysv, since usysv also uses TCP for off-node communication.
>
> > MPI_COMM_WORLD rank 0: tcp (v7.0.0)
> > MPI_COMM_WORLD rank 9: tcp (v7.1.0)
> >
> > I've installed lam-7.1.1 in all nodes, so I don't know why there are
> > different versions of tcp. When I run programs only in the mac nodes
> > or only in the linux nodes everything is fine.
>
> Double check your paths when running non-interactive jobs on these
> nodes. Somehow its finding an older TCP module -- perhaps a prior LAM
> installation? For example, compare the output of:
>
> ssh otherhost
> laminfo
>
> (i.e., an interactive login) vs. the following:
>
> ssh otherhost laminfo
>
> Check the path shown in the output of laminfo as well as the version
> numbers of the modules.
>
> --
> {+} Jeff Squyres
> {+} jsquyres_at_[hidden]
> {+} http://www.lam-mpi.org/
>
>
|