LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: McCalla, Mac (macmccalla_at_[hidden])
Date: 2005-04-05 02:48:31


Hi,
The initial error msg indicates the LAM executables are not found by the shell when searching the directories listed in the $PATH environmental variable. Have you tried the hints listed in the text following the error msg?

As a side issue, I would recommend using a higher level of LAM, ( my installation has been at 7.1.1 for many months), if at all possible.

Hope this helps,

.
Mac McCalla

Mac McCalla
--------------------------
Sent from my BlackBerry Wireless Handheld

-----Original Message-----
From: lam-bounces_at_[hidden] <lam-bounces_at_[hidden]>
To: lam_at_[hidden] <lam_at_[hidden]>
Sent: Tue Apr 05 00:46:17 2005
Subject: LAM: lamboot problem

I am facing the below mentioned problem during lamboot.Any solution please.

Kirubakaran

[test_at_bioinfo test]$ lamboot -v hostfile

LAM 7.0.6/MPI 2 C++/ROMIO - Indiana University

n-1<3123> ssi:boot:base:linear: booting n0 (bioinfo)
n-1<3123> ssi:boot:base:linear: booting n1 (compute-0-0)
ERROR: LAM/MPI unexpectedly received the following on stderr:
bash: line 1: hboot: command not found
----------------------------------------------------------------------------
-
LAM failed to execute a LAM binary on the remote node "compute-0-0".
Since LAM was already able to determine your remote shell as "hboot",
it is probable that this is not an authentication problem.

LAM tried to use the remote agent command "ssh"
to invoke the following command:

        ssh -x compute-0-0 -n hboot -t -c lam-conf.lamd -v -s -I "-H
10.1.1.1 -P 32941 -n 1 -o 0"

This can indicate several things. You should check the following:

        - The LAM binaries are in your $PATH
        - You can run the LAM binaries
        - The $PATH variable is set properly before your
          .cshrc/.profile exits

Try to invoke the command listed above manually at a Unix prompt.

You will need to configure your local setup such that you will *not*
be prompted for a password to invoke this command on the remote node.
No output should be printed from the remote node before the output of
the command is displayed.

When you can get this command to execute successfully by hand, LAM
will probably be able to function properly.
----------------------------------------------------------------------------
-
n-1<3123> ssi:boot:base:linear: Failed to boot n1 (compute-0-0)
n-1<3123> ssi:boot:base:linear: aborted!
----------------------------------------------------------------------------
-
lamboot encountered some error (see above) during the boot process,
and will now attempt to kill all nodes that it was previously able to
boot (if any).

Please wait for LAM to finish; if you interrupt this process, you may
have LAM daemons still running on remote nodes.
----------------------------------------------------------------------------
-
n-1<3129> ssi:boot:base:linear: booting n0 (bioinfo)
n-1<3129> ssi:boot:base:linear: booting n1 (compute-0-0)
ERROR: LAM/MPI unexpectedly received the following on stderr:
bash: line 1: tkill: command not found
----------------------------------------------------------------------------
-
LAM failed to execute a LAM binary on the remote node "compute-0-0".
Since LAM was already able to determine your remote shell as "tkill",
it is probable that this is not an authentication problem.

LAM tried to use the remote agent command "ssh"
to invoke the following command:

        ssh -x compute-0-0 -n tkill -v

This can indicate several things. You should check the following:

        - The LAM binaries are in your $PATH
        - You can run the LAM binaries
        - The $PATH variable is set properly before your
          .cshrc/.profile exits

Try to invoke the command listed above manually at a Unix prompt.

You will need to configure your local setup such that you will *not*
be prompted for a password to invoke this command on the remote node.
No output should be printed from the remote node before the output of
the command is displayed.

When you can get this command to execute successfully by hand, LAM
will probably be able to function properly.
----------------------------------------------------------------------------
-
n-1<3129> ssi:boot:base:linear: Failed to boot n1 (compute-0-0)
n-1<3129> ssi:boot:base:linear: aborted!
lamboot did NOT complete successfully
[test_at_bioinfo test]$
_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/