Besides u can try to put the boot_rsh_ignore_stderr 1
command on the .bashrc file instead of the boot section
also are you cbooting from the master or the nodes ?
Roberto Scipioni
ICYS, NIMS Japan
Administrator ICYS Computing cluster
----- Original Message -----
From: Jeff Squyres <jsquyres_at_[hidden]>
Date: Tuesday, September 11, 2007 9:22 pm
Subject: Re: LAM: Able to boot LAM only on single node
> On Sep 6, 2007, at 1:23 AM, jeevitesh_at_[hidden] wrote:
>
> > Hi MPI/LAM Group,
> > In my LAN, I have installed LAM/MPI on three
> > system, I am
> > getting following error,
> >
> > lamboot -v -ssi boot_rsh_ignore_stderr hostfile
>
> Note that "-ssi" takes 2 parameters; I think you are missing the
> "1"
> value to the boot_rsh_ignore_stderr token:
>
> lamboot -v -ssi boot_rsh_ignore_stderr 1 hostfile
>
> And therefore the "hostfile" argument is being ignored (i.e.,
> taken
> as the non-sensical value for boot_rsh_ignore_stderr SSI parameter).
>
> > LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University
> > n-1<22789> ssi:boot:base:linear: booting n0 (localhost)
> > n-1<22789> ssi:boot:base:linear: finished
> >
> > Here in only one system I was able to boot lam, and i have taken
>
> > the following
> > steps.
> >
> > 1.In Hostfile IP address of other two system.
> > 2..rhosts in home directory with IP and username( in all the
> three
> > system i have
> > my user account)
> > 3.Installed LAM on three system
> > 4.I am able to do rsh to each individual system.
> > But getting following warning
> >
> > rsh 192.168.1.141
> > connect to address 192.168.1.141 port 543: Connection refused
> > Trying krb4 rlogin...
> > connect to address 192.168.1.141 port 543: Connection refused
> > trying normal rlogin (/usr/bin/rlogin)
> > Last login: Wed Sep 5 10:36:38 on :0
>
> This means that rsh is falling back to a different protocol to
> login. Perhaps you might want to try a different service, such as
> ssh?
> You can set which agent LAM uses (rsh vs. ssh) at run time -- see
> http://www.lam-mpi.org/faq/category4.php3#question14.
>
> > But able to login without getting prompted for password.
> >
> > I have tried the conventional way of booting a LAM but did not
> > completed
> > successfully
> > So i followed the above way of booting
> >
> > lamboot -v -ssi boot rsh hostfile
> >
> > LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University
> >
> > n-1<23096> ssi:boot:base:linear: booting n0 (192.168.1.125)
> > n-1<23096> ssi:boot:base:linear: booting n1 (192.168.1.141)
> > ERROR: LAM/MPI unexpectedly received the following on stderr:
> > connect to address 192.168.1.141 port 544: Connection refused
> > connect to address 192.168.1.141 port 544: Connection refused
> > trying normal rsh (/usr/bin/rsh)
> > -----------------------------------------------------------------
> -----
> > -------
> > LAM attempted to execute a process on the remote node
> "192.168.1.141",> but received some output on the standard error.
> This heuristic
> > assumes that any output on the standard error indicates a fatal
> error,> and therefore aborts. You can disable this behavior
> (i.e., have LAM
> > ignore output on standard error) in the rsh boot module by
> setting the
> > SSI parameter boot_rsh_ignore_stderr to 1.
> >
> > LAM tried to use the remote agent command "rsh"
> > to invoke "echo $SHELL" on the remote node.
> >
> > Thanks & regards
> > jeevitesh
> >
> >
> > _______________________________________________
> > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
|