LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: 460853_at_[hidden]
Date: 2006-12-11 06:59:30


Hi... I'm a total newbie on this (so maybe I won't be able to help at
all), but
what does a recon ~/host say? Probably the same, isn't it? I have tryied that,
and it's what it says when the remote machine (10.101.11.58) is off. I guess
you have checked all the advices that appear in the error message... If you're
able to run lamboot in the 10.101.11.58, I would check that an ls -la
gives the
same permissions to all the files, and that the file structure looks
reasonably
the same (I mean... that you've got a .profile with the same
permissions in the
.58 machine than in the .45 machine and so on)

I'm sorry for maybe confusing you. This is what I (a LAM ignorant) would do

Regards

Quoting bcruchet_at_[hidden]:

>
> HI!!
>
> view your /etc/hosts ( GNU/Linux ?? ) and add this:
>
> 10.101.11.45 cpu1
> 10.101.11.58 cpu2
>
> sometimes the system made a DNS query, this take many time on some
> systems.
>
> :)
>
>
>> Hi,
>>
>> I was trying to use lamboot command using 2 cpus. I made a
>> hostfile on 10.101.11.45 like this:
>>
>> 10.101.11.45 user=manojv
>> 10.101.11.58 user=manoj
>>
>> When I use $ lamboot hostfile, it takes too much of time and gives
>> error(pasted below). I am using secured connection using ssh keys. I am
>> able to connect 10.101.11.58 without any password or from 10.101.11.58, I
>> am able to connect 10.101.11.45.
>>
>> When I use the same command with same hostfile on 10.101.11.58, it's done
>> without any problem.
>>
>> I have made sure that on both of the machines, there is same version of
>> LAM(7.1.1).
>>
>> Can anybody have idea why I am not able to lamboot from 10.101.11.45 ???
>>
>>
>> the error it gives is pasted below for the reference.
>> thanks.
>>
>> error::
>> -------------------------------------------------------------
>> manojv_at_10.101.11.45 $ lamboot ~/host
>>
>> LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University
>>
>> ERROR: LAM/MPI unexpectedly received the following on stderr:
>> eros: Connection refused
>> -----------------------------------------------------------------------------
>> LAM failed to execute a process on the remote node "manoj_at_10.101.11.58".
>> LAM was not trying to invoke any LAM-specific commands yet -- we were
>> simply trying to determine what shell was being used on the remote
>> host.
>>
>> LAM tried to use the remote agent command "rsh"
>> to invoke "echo $SHELL" on the remote node.
>>
>> *** PLEASE READ THIS ENTIRE MESSAGE, FOLLOW ITS SUGGESTIONS, AND
>> *** CONSULT THE "BOOTING LAM" SECTION OF THE LAM/MPI FAQ
>> *** (http://www.lam-mpi.org/faq/) BEFORE POSTING TO THE LAM/MPI USER'S
>> *** MAILING LIST.
>>
>> This usually indicates an authentication problem with the remote
>> agent, some other configuration type of error in your .cshrc or
>> .profile file, or you were unable to executable a command on the
>> remote node for some other reason. The following is a list of items
>> that you should check on the remote node:
>>
>> - You have an account and can login to the remote machine
>> - Incorrect permissions on your home directory (should
>> probably be 0755)
>> - Incorrect permissions on your $HOME/.rhosts file (if you are
>> using rsh -- they should probably be 0644)
>> - You have an entry in the remote $HOME/.rhosts file (if you
>> are using rsh) for the machine and username that you are
>> running from
>> - Your .cshrc/.profile must not print anything out to the
>> standard error
>> - Your .cshrc/.profile should set a correct TERM type
>> - Your .cshrc/.profile should set the SHELL environment
>> variable to your default shell
>>
>> Try invoking the following command at the unix command line:
>>
>> rsh 10.101.11.58 -n -l manoj 'echo $SHELL'
>>
>> You will need to configure your local setup such that you will *not*
>> be prompted for a password to invoke this command on the remote node.
>> No output should be printed from the remote node before the output of
>> the command is displayed.
>>
>> When you can get this command to execute successfully by hand, LAM
>> will probably be able to function properly.
>> -----------------------------------------------------------------------------
>>
>>
>> --
>> manoj vaghela
>> zeus numerix pvt ltd
>> aerospace engineering department
>> indian institute of technology bombay
>>
>> _______________________________________________
>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>