On Sep 6, 2007, at 1:23 AM, jeevitesh_at_[hidden] wrote:
> Hi MPI/LAM Group,
> In my LAN, I have installed LAM/MPI on three
> system, I am
> getting following error,
>
> lamboot -v -ssi boot_rsh_ignore_stderr hostfile
Note that "-ssi" takes 2 parameters; I think you are missing the "1"
value to the boot_rsh_ignore_stderr token:
lamboot -v -ssi boot_rsh_ignore_stderr 1 hostfile
And therefore the "hostfile" argument is being ignored (i.e., taken
as the non-sensical value for boot_rsh_ignore_stderr SSI parameter).
> LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University
> n-1<22789> ssi:boot:base:linear: booting n0 (localhost)
> n-1<22789> ssi:boot:base:linear: finished
>
> Here in only one system I was able to boot lam, and i have taken
> the following
> steps.
>
> 1.In Hostfile IP address of other two system.
> 2..rhosts in home directory with IP and username( in all the three
> system i have
> my user account)
> 3.Installed LAM on three system
> 4.I am able to do rsh to each individual system.
> But getting following warning
>
> rsh 192.168.1.141
> connect to address 192.168.1.141 port 543: Connection refused
> Trying krb4 rlogin...
> connect to address 192.168.1.141 port 543: Connection refused
> trying normal rlogin (/usr/bin/rlogin)
> Last login: Wed Sep 5 10:36:38 on :0
This means that rsh is falling back to a different protocol to
login. Perhaps you might want to try a different service, such as ssh?
You can set which agent LAM uses (rsh vs. ssh) at run time -- see
http://www.lam-mpi.org/faq/category4.php3#question14.
> But able to login without getting prompted for password.
>
> I have tried the conventional way of booting a LAM but did not
> completed
> successfully
> So i followed the above way of booting
>
> lamboot -v -ssi boot rsh hostfile
>
> LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University
>
> n-1<23096> ssi:boot:base:linear: booting n0 (192.168.1.125)
> n-1<23096> ssi:boot:base:linear: booting n1 (192.168.1.141)
> ERROR: LAM/MPI unexpectedly received the following on stderr:
> connect to address 192.168.1.141 port 544: Connection refused
> connect to address 192.168.1.141 port 544: Connection refused
> trying normal rsh (/usr/bin/rsh)
> ----------------------------------------------------------------------
> -------
> LAM attempted to execute a process on the remote node "192.168.1.141",
> but received some output on the standard error. This heuristic
> assumes that any output on the standard error indicates a fatal error,
> and therefore aborts. You can disable this behavior (i.e., have LAM
> ignore output on standard error) in the rsh boot module by setting the
> SSI parameter boot_rsh_ignore_stderr to 1.
>
> LAM tried to use the remote agent command "rsh"
> to invoke "echo $SHELL" on the remote node.
>
> Thanks & regards
> jeevitesh
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
--
Jeff Squyres
Cisco Systems
|