dear all,
i need to set a cluster of two nodes for lamboot, i am using fedora 7
1. passwordles ssh is set.
2. a folder in the home directory is nfs mounted, both on server( 10.1.7.136
ip of the server)and client(ip of the client 10.1.7.129).
lam
3. lam 7.1.1. is installed on the system.
4. a file named lamhosts is created containg the ip's of server and client
respetively
[lamuser_at_localhost lam]$ cat lamhost
10.1.7.136
10.1.7.129
lamboot is working on indvidual system but on
[lamuser_at_localhost ~]$ lamboot -v lamhost
LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University
n-1<3088> ssi:boot:base:linear: booting n0 (10.1.7.136)
n-1<3088> ssi:boot:base:linear: booting n1 (10.1.7.129)
ERROR: LAM/MPI unexpectedly received the following on stderr:
10.1.7.129: Connection refused
-----------------------------------------------------------------------------
LAM failed to execute a process on the remote node "10.1.7.129".
LAM was not trying to invoke any LAM-specific commands yet -- we were
simply trying to determine what shell was being used on the remote
host.
LAM tried to use the remote agent command "rsh"
to invoke "echo $SHELL" on the remote node.
*** PLEASE READ THIS ENTIRE MESSAGE, FOLLOW ITS SUGGESTIONS, AND
*** CONSULT THE "BOOTING LAM" SECTION OF THE LAM/MPI FAQ
*** (http://www.lam-mpi.org/faq/) BEFORE POSTING TO THE LAM/MPI USER'S
*** MAILING LIST.
This usually indicates an authentication problem with the remote
agent, some other configuration type of error in your .cshrc or
.profile file, or you were unable to executable a command on the
remote node for some other reason. The following is a list of items
that you should check on the remote node:
- You have an account and can login to the remote machine
- Incorrect permissions on your home directory (should
probably be 0755)
- Incorrect permissions on your $HOME/.rhosts file (if you are
using rsh -- they should probably be 0644)
- You have an entry in the remote $HOME/.rhosts file (if you
are using rsh) for the machine and username that you are
running from
- Your .cshrc/.profile must not print anything out to the
standard error
- Your .cshrc/.profile should set a correct TERM type
- Your .cshrc/.profile should set the SHELL environment
variable to your default shell
Try invoking the following command at the unix command line:
rsh 10.1.7.129 -n 'echo $SHELL'
You will need to configure your local setup such that you will *not*
be prompted for a password to invoke this command on the remote node.
No output should be printed from the remote node before the output of
the command is displayed.
When you can get this command to execute successfully by hand, LAM
will probably be able to function properly.
-----------------------------------------------------------------------------
n-1<3088> ssi:boot:base:linear: Failed to boot n1 (10.1.7.129)
n-1<3088> ssi:boot:base:linear: aborted!
n-1<3093> ssi:boot:base:linear: booting n0 (10.1.7.136)
n-1<3093> ssi:boot:base:linear: booting n1 (10.1.7.129)
ERROR: LAM/MPI unexpectedly received the following on stderr:
10.1.7.129: Connection refused
-----------------------------------------------------------------------------
LAM failed to execute a process on the remote node "10.1.7.129".
LAM was not trying to invoke any LAM-specific commands yet -- we were
simply trying to determine what shell was being used on the remote
host.
LAM tried to use the remote agent command "rsh"
to invoke "echo $SHELL" on the remote node.
*** PLEASE READ THIS ENTIRE MESSAGE, FOLLOW ITS SUGGESTIONS, AND
*** CONSULT THE "BOOTING LAM" SECTION OF THE LAM/MPI FAQ
*** (http://www.lam-mpi.org/faq/) BEFORE POSTING TO THE LAM/MPI USER'S
*** MAILING LIST.
This usually indicates an authentication problem with the remote
agent, some other configuration type of error in your .cshrc or
.profile file, or you were unable to executable a command on the
remote node for some other reason. The following is a list of items
that you should check on the remote node:
- You have an account and can login to the remote machine
- Incorrect permissions on your home directory (should
probably be 0755)
- Incorrect permissions on your $HOME/.rhosts file (if you are
using rsh -- they should probably be 0644)
- You have an entry in the remote $HOME/.rhosts file (if you
are using rsh) for the machine and username that you are
running from
- Your .cshrc/.profile must not print anything out to the
standard error
- Your .cshrc/.profile should set a correct TERM type
- Your .cshrc/.profile should set the SHELL environment
variable to your default shell
Try invoking the following command at the unix command line:
rsh 10.1.7.129 -n 'echo $SHELL'
You will need to configure your local setup such that you will *not*
be prompted for a password to invoke this command on the remote node.
No output should be printed from the remote node before the output of
the command is displayed.
When you can get this command to execute successfully by hand, LAM
will probably be able to function properly.
-----------------------------------------------------------------------------
n-1<3093> ssi:boot:base:linear: Failed to boot n1 (10.1.7.129)
n-1<3093> ssi:boot:base:linear: aborted!
lamboot did NOT complete successfully
please help
--
Ankur Pachauri.
09927590910
Research Scholar,
software engineering.
Department of Mathematics
Dayalbagh Educational Institute
Dayalbagh,
AGRA
|