LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Shashwat Srivastav (ssrivast_at_[hidden])
Date: 2003-12-26 14:19:35


Hi,

If you are using ssh for lamboot, your ssh-keys on your nodes should be
set up such that you can ssh from one node to another without any
interaction. Reading the error messages it appears to me that this is
where lamboot fails. Can you login to the first node and ssh to other
nodes and see that you can login in there without entering any password
? Please let me know if this is not the problem.

Thanks.

--
Shashwat Srivastav
LAM / MPI Developer (http://www.lam-mpi.org)
Indiana University
http://www.cs.indiana.edu/~ssrivast
On Dec 25, 2003, at 4:25 AM, PeiQuan Chen wrote:
> Dear lam-user,
>
> I have install the lam-7.0.3 in a linux cluster which use ssh to rsh,  
> and use
> the pbs as TM.
>
> But when I want to launch lamdboot. A error message have told me:
> LAM 7.0.3/MPI 2 C++/ROMIO - Indiana University
>
> ----------------------------------------------------------------------- 
> ------
>
> lamboot encountered some error (see above) during the boot process,
> and will now attempt to kill all nodes that it was previously able to
> boot (if any).
>
> Please wait for LAM to finish; if you interrupt this process, you may
> have LAM daemons still running on remote nodes.
> ----------------------------------------------------------------------- 
> ------
>
> lamboot: wipe -- nothing to do
>
> And I have use to recon -d to test the error, It told me:
> recon was not able to complete successfully.  There can be any number
> of problems that did not allow recon to work properly.  You should use
> the "-d" option to recon to get more information about each step that
> recon attempts.
>
> Any error message above may present a more detailed description of the
> actual problem.
>
> Here is general a list of prerequisites that *must* be fulfilled
> before recon can work:
>
>         - Each machine in the hostfile must be reachable and  
> operational.
>         - You must have an account on each machine.
>         - You must be able to rsh(1) to the machine (permissions
>           are typically set in the user's $HOME/.rhosts file).
>
>         *** Sidenote: If you compiled LAM to use a remote shell program
>             other than rsh (with the --with-rsh option to ./configure;
>             e.g., ssh), or if you set the LAMRSH environment variable
>             to an alternate remote shell program, you need to ensure
>             that you can execute programs on remote nodes with no
>             password.  For example:
>
>         unix% ssh -x pinky uptime
>         3:09am up 211 day(s), 23:49, 2 users, load average: 0.01,  
> 0.08, 0.10
>
>         - The LAM executables must be locatable on each machine, using
>           the shell's search path and possibly the LAMHOME environment
>           variable.
>         - The shell's start-up script must not print anything on  
> standard
>           error.  You can take advantage of the fact that rsh(1) will
>           start the shell non-interactively.  The start-up script (such
>           as .profile or .cshrc) can exit early in this case, before
>           executing many commands relevant only to interactive sessions
>           and likely to generate output.
> ----------------------------------------------------------------------- 
> ------
>
> n0<12922> ssi:boot:rsh: finalizing
> n0<12922> ssi:boot: Closing
>
> Could anybody help me to resolve this problem?
> Thank you in advance
>
> PeiQuan Chen
>
>
>
>
>
> --http://www.eyou.com
> --Îȶ¨¿É¿¿µÄµç×ÓÐÅÏä  ÓïÒôÓʼþ  ÒÆ¶¯ÊéÇ©  ÈÕÀú·þÎñ  ÍøÂç´æ´¢...ÒÚÓÊδ¾¡
>
> --http://vip.eyou.com
> --¿ì¿ìµÇ¼ÒÚÓÊVIPÐÅÏä  ×¢²áÄúÖÐÒâµÄÓû§Ãû
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>