LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Brian Barrett (brbarret_at_[hidden])
Date: 2006-03-13 23:31:53


On Mar 6, 2006, at 12:24 AM, Swati Longia wrote:

> I have a demo Beowulf cluster consisting of just 2 machines, master
> and
> slave. I have installed LAM on it
> The version of LAM I have is 7.1.1.
> I installed it exactly the way it is specified in the manual.
>
> When I try to boot it up, it gives me the following error.
> lamboot -v lamhosts
>
>
> LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University
>
> n-1<6141> ssi:boot:base:linear: booting n0 (slave)
> ERROR: LAM/MPI unexpectedly received the following on stderr:
> connect to address 10.1.22.165: Connection refused
> connect to address 10.1.22.165: Connection refused
> trying normal rsh (/usr/bin/rsh)

<snip>

> Try invoking the following command at the unix command line:
>
> rsh slave -n 'echo $SHELL'

<snip>

> I tried the following command
> ssh slave -n 'echo $SHELL'
> It gave me the proper result without asking for password, but when
> I try
> to do the same with 'rsh'
> it hangs.
> Can someone help me on it.

It looks like your cluster is configured to use ssh instead of rsh.
Therefore, the easiest solution is going to be to have LAM/MPI use
ssh instead of rsh. There are two ways that LAM/MPI can be
configured to use ssh instead of rsh:

   1) rebuild LAM, adding the --with-rsh=ssh flag to configure
   2) set the LAMRSH environment variable to ssh

Based on your test with "ssh slave -n 'echo $SHELL'" working without
a password, I would be willing to bet that LAM/MPI will work properly
if either change is made.

Hope this helps,

Brian

-- 
   Brian Barrett
   LAM/MPI developer and all around nice guy
   Have a LAM/MPI day: http://www.lam-mpi.org/