LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Irshad Ahmed (irshi2000_at_[hidden])
Date: 2004-04-05 13:47:56


Hello,

I am using Redhat 9,i face following error, when booting lam using "lamhost" file

which contains two nodes

1- Master, 2- Slave

>>> both machines have same account,

>>> .rhosts file contains " + + "

>>> No cshrc instead have .bashrc in "/home/ahmed/"

>>> /etc/hosts.equive contains " + + "

>>> The following error comes

>>> ssh is also not working even with password, it (ssh) was allright before i accidently change the "known_hosts" file in .ssh folder

 *************************************************************************

[ahmed_at_Slave ahmed]$ lamboot -v -ssi boot rsh /home/ahmed/lamhost

LAM 7.0.3/MPI 2 C++/ROMIO - Indiana University

n0<2906> ssi:boot:base:linear: booting n0 (Slave)

n0<2906> ssi:boot:base:linear: booting n1 (Master)

ERROR: LAM/MPI unexpectedly received the following on stderr:

Master: Connection refused

-----------------------------------------------------------------------------

LAM failed to execute a process on the remote node "Master".

LAM was not trying to invoke any LAM-specific commands yet -- we were

simply trying to determine what shell was being used on the remote

host.

LAM tried to use the remote agent command "rsh"

to invoke "echo $SHELL" on the remote node.

This usually indicates an authentication problem with the remote

agent, or some other configuration type of error in your .cshrc or

.profile file. The following is a list of items that you may wish to

check on the remote node:

- You have an account and can login to the remote machine

- Incorrect permissions on your home directory (should

probably be 0755)

- Incorrect permissions on your $HOME/.rhosts file (if you are

using rsh -- they should probably be 0644)

- You have an entry in the remote $HOME/.rhosts file (if you

are using rsh) for the machine and username that you are

running from

- Your .cshrc/.profile must not print anything out to the

standard error

- Your .cshrc/.profile should set a correct TERM type

- Your .cshrc/.profile should set the SHELL environment

variable to your default shell

Try invoking the following command at the unix command line:

rsh Master -n echo $SHELL

You will need to configure your local setup such that you will *not*

be prompted for a password to invoke this command on the remote node.

No output should be printed from the remote node before the output of

the command is displayed.

When you can get this command to execute successfully by hand, LAM

will probably be able to function properly.

-----------------------------------------------------------------------------

n0<2906> ssi:boot:base:linear: Failed to boot n1 (Master)

n0<2906> ssi:boot:base:linear: aborted!

-----------------------------------------------------------------------------

lamboot encountered some error (see above) during the boot process,

and will now attempt to kill all nodes that it was previously able to

boot (if any).

Please wait for LAM to finish; if you interrupt this process, you may

have LAM daemons still running on remote nodes.

-----------------------------------------------------------------------------

n0<2911> ssi:boot:base:linear: booting n0 (Slave)

n0<2911> ssi:boot:base:linear: booting n1 (Master)

ERROR: LAM/MPI unexpectedly received the following on stderr:

Master: Connection refused

-----------------------------------------------------------------------------

LAM failed to execute a process on the remote node "Master".

LAM was not trying to invoke any LAM-specific commands yet -- we were

simply trying to determine what shell was being used on the remote

host.

LAM tried to use the remote agent command "rsh"

to invoke "echo $SHELL" on the remote node.

This usually indicates an authentication problem with the remote

agent, or some other configuration type of error in your .cshrc or

.profile file. The following is a list of items that you may wish to

check on the remote node:

- You have an account and can login to the remote machine

- Incorrect permissions on your home directory (should

probably be 0755)

- Incorrect permissions on your $HOME/.rhosts file (if you are

using rsh -- they should probably be 0644)

- You have an entry in the remote $HOME/.rhosts file (if you

are using rsh) for the machine and username that you are

running from

- Your .cshrc/.profile must not print anything out to the

standard error

- Your .cshrc/.profile should set a correct TERM type

- Your .cshrc/.profile should set the SHELL environment

variable to your default shell

Try invoking the following command at the unix command line:

rsh Master -n echo $SHELL

You will need to configure your local setup such that you will *not*

be prompted for a password to invoke this command on the remote node.

No output should be printed from the remote node before the output of

the command is displayed.

When you can get this command to execute successfully by hand, LAM

will probably be able to function properly.

-----------------------------------------------------------------------------

n0<2911> ssi:boot:base:linear: Failed to boot n1 (Master)

n0<2911> ssi:boot:base:linear: aborted!

lamboot did NOT complete successfully

Ahmed irshad

---------------------------------
Do you Yahoo!?
Yahoo! Small Business $15K Web Design Giveaway - Enter today