LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Tony Arcieri (tarcieri_at_[hidden])
Date: 2004-11-02 15:59:55


I'm trying to run LAM MPI on an Xserve cluster running MacOS 10.3.5 with
LAM MPI having been installed from a package. I did this successfully on
a cluster a few months ago, but now that we have our actual cluster I'm
running into problems with lamboot.

I have a lamhosts file containing the IPs of two systems (there's many
more, but I'm just trying to get it going on two nodes for now). When I
execute lamboot -v lamhosts, I run into the following:

node1:~ ccastro$ lamboot -v lamhosts

LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University

n-1<1359> ssi:boot:base:linear: booting n0 (10.0.0.1)
n-1<1359> ssi:boot:base:linear: booting n1 (10.0.0.2)
ERROR: LAM/MPI unexpectedly received the following on stderr:
sh: line 1: [: missing `]'
sh: line 1: hboot: command not found

[...]

LAM tried to use the remote agent command "ssh"
to invoke the following command:

        ssh 10.0.0.2 -n '( ! [ -e ./.profile] || . ./.profile;' hboot -t
-c lam-conf.lamd -v -s -I '"-H 10.0.0.1 -P 50298 -n 1 -o 0"' )

Correct me if I'm wrong, but the single quote ordering appears to be off,
and clearly test does not like the bracket being placed right next to the
filename.

Regardless, .profile as well as /etc/profile are configured so the LAM
utilities are in the path (although neither seem to be processed by ssh
for non-login shells) and whatever glue LAM is using to attempting to
process them is evidently failing. Example:

node1:~ ccastro$ ssh node2 source .profile;hboot
-----------------------------------------------------------------------------
The booted program is missing at least one of the -H, -P, or -n
command line arguments. These arguments are required to tell the
booted program how to contact the booting agent.

Cannot continue. Sorry.
-----------------------------------------------------------------------------

Is there any way to alter the method by which lamboot is invoking hboot on
the other hosts without recompiling LAM? The command it is trying to
execute is clearly both malformatted, at least according to OS X's bash

Tony Arcieri