LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jim Corbett (jcorb54867_at_[hidden])
Date: 2003-07-14 18:03:32


Did you ever get your cluster to Lamboot?
Jim C
----- Original Message -----
From: "Sergei Lisenkov" <proffess_at_[hidden]>
To: <lam_at_[hidden]>
Sent: Thursday, July 10, 2003 7:38 AM
Subject: LAM: LAM on PC-clusters

> Dear LAM users,
>
> I tried to run LAM on 4 nodes (1 nodes=2 cpu), but I got error. I created
file "mynodes":
>
> panda cpu=2
> panda2 cpu=2
> panda3 cpu=2
> panda4 cpu=2
>
> I did:
> [proffess_at_panda work]$ lamboot mynodes
>
> LAM 7.0/MPI 2 C++/ROMIO - Indiana University
>
> ERROR: LAM/MPI unexpectedly received the following on stderr:
> rcmd: socket: Permission denied
> --------------------------------------------------------------------------

---
> LAM failed to execute a process on the remote node "panda2".
> LAM was not trying to invoke any LAM-specific commands yet -- we were
> simply trying to determine what shell was being used on the remote
> host.
>
> LAM tried to use the remote agent command "/home/proffess/bin/rsh"
> to invoke "echo $SHELL" on the remote node.
>
> This usually indicates an authentication problem with the remote
> agent, or some other configuration type of error in your .cshrc or
> .profile file.  The following is a list of items that you may wish to
> check on the remote node:
>
>         - You have an account and can login to the remote machine
>         - Incorrect permissions on your home directory (should
>           probably be 0755)
>         - Incorrect permissions on your $HOME/.rhosts file (if you are
>           using rsh -- they should probably be 0644)
>         - You have an entry in the remote $HOME/.rhosts file (if you
>           are using rsh) for the machine and username that you are
>           running from
>         - Your .cshrc/.profile must not print anything out to the
>           standard error
>         - Your .cshrc/.profile should set a correct TERM type
>         - Your .cshrc/.profile should set the SHELL environment
>           variable to your default shell
>
> Try invoking the following command at the unix command line:
>
>         /home/proffess/bin/rsh panda2 -n echo $SHELL
>
> You will need to configure your local setup such that you will *not*
> be prompted for a password to invoke this command on the remote node.
> No output should be printed from the remote node before the output of
> the command is displayed.
>
> When you can get this command to execute successfully by hand, LAM
> will probably be able to function properly.
> --------------------------------------------------------------------------
---
> --------------------------------------------------------------------------
---
> lamboot encountered some error (see above) during the boot process,
> and will now attempt to kill all nodes that it was previously able to
> boot (if any).
>
> Please wait for LAM to finish; if you interrupt this process, you may
> have LAM daemons still running on remote nodes.
> --------------------------------------------------------------------------
---
> ERROR: LAM/MPI unexpectedly received the following on stderr:
> rcmd: socket: Permission denied
> --------------------------------------------------------------------------
---
> LAM failed to execute a process on the remote node "panda2".
> LAM was not trying to invoke any LAM-specific commands yet -- we were
> simply trying to determine what shell was being used on the remote
> host.
>
> LAM tried to use the remote agent command "/home/proffess/bin/rsh"
> to invoke "echo $SHELL" on the remote node.
>
> This usually indicates an authentication problem with the remote
> agent, or some other configuration type of error in your .cshrc or
> .profile file.  The following is a list of items that you may wish to
> check on the remote node:
>
>         - You have an account and can login to the remote machine
>         - Incorrect permissions on your home directory (should
>           probably be 0755)
>         - Incorrect permissions on your $HOME/.rhosts file (if you are
>           using rsh -- they should probably be 0644)
>         - You have an entry in the remote $HOME/.rhosts file (if you
>           are using rsh) for the machine and username that you are
>           running from
>         - Your .cshrc/.profile must not print anything out to the
>           standard error
>         - Your .cshrc/.profile should set a correct TERM type
>         - Your .cshrc/.profile should set the SHELL environment
>           variable to your default shell
>
> Try invoking the following command at the unix command line:
>
>         /home/proffess/bin/rsh panda2 -n echo $SHELL
>
> You will need to configure your local setup such that you will *not*
> be prompted for a password to invoke this command on the remote node.
> No output should be printed from the remote node before the output of
> the command is displayed.
>
> When you can get this command to execute successfully by hand, LAM
> will probably be able to function properly.
> --------------------------------------------------------------------------
---
>
> What does it mean? I could login on panda2, panda3 ,..., using rsh without
password. The rsh in my PATH. What should I do?
>
> Thanks,
> Sergey
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/