LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Irshad Ahmed (irshi2000_at_[hidden])
Date: 2004-04-08 01:34:37


Hi,

1- "passwordless rsh" from Master to Slave and from Slave to Master is done

successfully with only message "last login: Thursday 08, 2004 from Master"

2- There is no .cshrc/.profile in the usr's Home directory

3- No /etc/.cshrc and /etc/.login files, instead there are csh.cshrc and

csh.login files in /etc directory

4- I think that it doesn't find the " hboot " to execute,i include path "/usr/local/bin"

in /etc/profile, file but it didnot work.

5- How can i come to know that .cshrc/.profile will not print anything to standard error?

6- .cshrc/.profile has set a correct term type?

7- .cshrc/.profile has set the SHELL environment variable to the default shell?

****************************************************

ERROR DISPLAYED

****************************************************

[ahmed_at_Master ahmed]$ lamboot -v -ssi boot rsh /home/ahmed/lamhost

LAM 7.0.3/MPI 2 C++/ROMIO - Indiana University

n0<3345> ssi:boot:base:linear: booting n0 (Slave)

ERROR: LAM/MPI unexpectedly received the following on stderr:

bash: line 1: hboot: command not found

-----------------------------------------------------------------------------

LAM attempted to execute a process on the remote node "Slave",

but received some output on the standard error.

LAM tried to use the remote agent command "rsh"

to invoke "hboot" on the remote node.

This can indicate an authentication error with the remote agent, or

can indicate an error in your $HOME/.cshrc, $HOME/.login, or

$HOME/.profile files. The following is a list of items that you may

wish to check on the remote node:

- You have an account and can login to the remote machine

- Incorrect permissions on your home directory (should

probably be 0755)

- Incorrect permissions on your $HOME/.rhosts file (if you are

using rsh -- they should probably be 0644)

- You have an entry in the remote $HOME/.rhosts file (if you

are using rsh) for the machine and username that you are

running from

- Your .cshrc/.profile must not print anything out to the

standard error

- Your .cshrc/.profile should set a correct TERM type

- Your .cshrc/.profile should set the SHELL environment

variable to your default shell

Try invoking the following command at the unix command line:

rsh Slave -n hboot -t -c lam-conf.lamd -v -s -I "-H 10.0.0.10 -P 1059 -n 0 -o 1"

You will need to configure your local setup such that you will *not*

be prompted for a password to invoke this command on the remote node.

No output should be printed from the remote node before the output of

the command is displayed.

When you can get this command to execute successfully by hand, LAM

will probably be able to function properly.

-----------------------------------------------------------------------------

n0<3345> ssi:boot:base:linear: Failed to boot n0 (Slave)

n0<3345> ssi:boot:base:linear: aborted!

-----------------------------------------------------------------------------

lamboot encountered some error (see above) during the boot process,

and will now attempt to kill all nodes that it was previously able to

boot (if any).

Please wait for LAM to finish; if you interrupt this process, you may

have LAM daemons still running on remote nodes.

-----------------------------------------------------------------------------

lamboot: wipe -- nothing to do

lamboot did NOT complete successfully

*************************************************

END ERROR

*************************************************

[ahmed_at_Master ahmed]$ rsh Slave -n hboot -t -c lam-conf.lamd -v -s -I "-H 10.0.0.10 -P 1057 -n 0 -o 1"

bash: line 1: hboot: command not found

 

 

 

***********************************************************

WHEN PATH /usr/local/bin IS GIVEN TO hboot COMMAND

***********************************************************

[ahmed_at_Master ahmed]$ rsh Slave -n /usr/local/bin/hboot -t -c lam-conf.lamd -v -s -I "-H 10.0.0.10 -P 1053 -n 0 -o 1"

-----------------------------------------------------------------------------

Synopsis: hboot [-dhnNstv] [-c <schema>] [-I <inet_topo>] [-R <rtr_topo>]

Description: Start LAM on the local node

Options:

-c <conf> Use <conf> as the process schema

-b <name> Use <name> for the unix socket names

-d Print debug information (implies -v)

-h Print this message

-I <inet_topo> Set $inet_topo variable

-N Pretend to hboot (used with recon(1))

-R <rtr_topo> Set $rtr_topo variable

-s Close stdio of processes

-t Kill existing session first

-v Be verbose

-----------------------------------------------------------------------------

 

 

Ahmed irshad

---------------------------------
Do you Yahoo!?
Yahoo! Small Business $15K Web Design Giveaway - Enter today