On Oct 17, 2005, at 10:09 PM, James Dorsey wrote:
> I noted that the user manual (v7.1.1 page 22, second table) said LAM
> would
> call ~/.profile when a non-interactive sh shell was called. This
> doesn't
> appear to be happening (when using 7.1.2b26). I put an echo foo in
> ~/.profile, and it (foo) wasn't appearing when LAM tried to use a
> non-interactive remote shell. It *was* appearing when I started an
> interactive remote shell. It seems something is broken on my
> installation
> at least. So, back to the quick dirty fix as above with the links...
LAM should definitely be running your .profile on the remote nodes.
Can you run "lamboot -d" with the b26 version and send the output?
>> I *think* that many of your problems is that LAM 7.1.1 is putting the
>> ]
>> in the wrong place -- it's missing a space, causing the parsing on the
>> remote node to go badly.
>
>> The latest beta of 7.1.2 fixes this issue -- could you give that a
>> while? It might also fix your $LAMHOME/PATH issues (if the shell
>> parsing is wrong right off the bat, other things can go wrong).
>
> Same issue as with 7.1.1. The close bracket is at the end of the line,
> LAM
> barfs on stderr output, and suggests the same test command. Which still
> won't work until I swap the order of the last two non-white characters
> -
> the ' and the ) in:
>
> rsh euler -n '( ! [ -e ./.profile] || . ./profile;' hboot -t -c
> lam-conf.lamd -d -v -s -I '"-H 192.168.0.10 -P 60547 -n 1 -o 0"' )
Are you absolutely sure that you're running b26? Somewhere early in
the 7.1.2 beta's, we insertted the missing " " before ]. So the b26
line should read like this:
> rsh euler -n '( ! [ -e ./.profile ] || . ./profile;' hboot -t -c
> lam-conf.lamd -d -v -s -I '"-H 192.168.0.10 -P 60547 -n 1 -o 0"' )
Note the space before the ].
> Here's the error message. Much less detail this time, as the issue
> seems
> to be the same as in my last mail:
>
> <Preceded by lots of nice looking happy messages>
> n-1<1028> ssi:boot:rsh: attempting to execute: rsh euler -n '( ! [ -e
> ./.profile] || . ./.profile;' hboot -t -c lam-conf.lamd -d -v -s -I
> '"-H
> 192.168.0.10 -P 55089 -n 1 -o 0"' )
> ERROR: LAM/MPI unexpectedly received the following on stderr:
> [: missing ]
This is exactly in-line with missing a space before the ].
Can you double check your PATH and whatnot to ensure that you're
running b26 on the local and remote nodes?
--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/
|