LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: PeiQuan Chen (lammpi_at_[hidden])
Date: 2003-12-25 04:25:42


Dear lam-user,

I have install the lam-7.0.3 in a linux cluster which use ssh to rsh, and use
the pbs as TM.

But when I want to launch lamdboot. A error message have told me:
LAM 7.0.3/MPI 2 C++/ROMIO - Indiana University

-----------------------------------------------------------------------------

lamboot encountered some error (see above) during the boot process,
and will now attempt to kill all nodes that it was previously able to
boot (if any).

Please wait for LAM to finish; if you interrupt this process, you may
have LAM daemons still running on remote nodes.
-----------------------------------------------------------------------------

lamboot: wipe -- nothing to do

And I have use to recon -d to test the error, It told me:
recon was not able to complete successfully. There can be any number
of problems that did not allow recon to work properly. You should use
the "-d" option to recon to get more information about each step that
recon attempts.

Any error message above may present a more detailed description of the
actual problem.

Here is general a list of prerequisites that *must* be fulfilled
before recon can work:

        - Each machine in the hostfile must be reachable and operational.
        - You must have an account on each machine.
        - You must be able to rsh(1) to the machine (permissions
          are typically set in the user's $HOME/.rhosts file).

        *** Sidenote: If you compiled LAM to use a remote shell program
            other than rsh (with the --with-rsh option to ./configure;
            e.g., ssh), or if you set the LAMRSH environment variable
            to an alternate remote shell program, you need to ensure
            that you can execute programs on remote nodes with no
            password. For example:

        unix% ssh -x pinky uptime
        3:09am up 211 day(s), 23:49, 2 users, load average: 0.01, 0.08, 0.10

        - The LAM executables must be locatable on each machine, using
          the shell's search path and possibly the LAMHOME environment
          variable.
        - The shell's start-up script must not print anything on standard
          error. You can take advantage of the fact that rsh(1) will
          start the shell non-interactively. The start-up script (such
          as .profile or .cshrc) can exit early in this case, before
          executing many commands relevant only to interactive sessions
          and likely to generate output.
-----------------------------------------------------------------------------

n0<12922> ssi:boot:rsh: finalizing
n0<12922> ssi:boot: Closing

Could anybody help me to resolve this problem?
Thank you in advance

PeiQuan Chen

--http://www.eyou.com
--Îȶ¨¿É¿¿µÄµç×ÓÐÅÏä ÓïÒôÓʼþ ÒÆ¶¯ÊéÇ© ÈÕÀú·þÎñ ÍøÂç´æ´¢...ÒÚÓÊδ¾¡

--http://vip.eyou.com
--¿ì¿ìµÇ¼ÒÚÓÊVIPÐÅÏä ×¢²áÄúÖÐÒâµÄÓû§Ãû