There is no /usr/bin/lamd. I installed lam in my home directory,
and set LAMHOME in .bashrc. /tmp is writable.
The random sockets could be a problem; I'm not sure how to test
that but if I do say
cat > /dev/udp/localhost/17
there is no error, whereas
$ cat > /dev/tcp/localhost/17
bash: connect: Connection refused
bash: /dev/tcp/localhost/17: Connection refused
Is it tcp or udp that is needed?
I was hoping to get lam up without needing to get ahold
of the su.
On Wed, Sep 04, 2002 at 01:31:29AM -0500, Vishal Sahay wrote:
> It looks like the fork is failing, somehow.
> Check for the following things:
>
> - /usr/bin/lamd is the same version of LAM as lamboot. See if
> lamboot is in /usr/bin, and that they're both 6.5.6.
>
> - /tmp is writable?
>
> - Firewall software is installed such that opening random sockets to
> localhost will fail.
>
>
> -Vishal Sahay
> ===================================================================
> (Graduate Student, CS Dept. Make Today A LAM/MPI Day :)
> Indiana University, Bloomington) http://www.lam-mpi.org
> http://cs.indiana.edu/~vsahay
> ===================================================================
>
> On Sat, 31 Aug 2002, David Shattuck wrote:
>
> # Hi -
> #
> # I am trying to boot a lam cluster with two machines. One of these cannot
> # lamboot itself. When I try, I get a error message with no description of
> # the error. Any idea what could be causing this? I have included the
> # output of both "lamboot" and "lamboot -d -v" below. SSH to the machine
> # works fine, and I have LAMRSH set to "ssh -x".
> #
> # thanks,
> # David Shattuck
> # UCLA Laboratory of Neuro Imaging
> #
> #
> #
> #
> #
> #
> # [glitch_at_wulfpet3 glitch]$ lamboot
> #
> # LAM 6.5.6/MPI 2 C++/ROMIO - University of Notre Dame
> #
> # -----------------------------------------------------------------------------
> # lamboot encountered some error (see above) during the boot process,
> # and will now attempt to kill all nodes that it was previously able to
> # boot (if any).
> #
> # Please wait for LAM to finish; if you interrupt this process, you may
> # have LAM daemons still running on remote nodes.
> # -----------------------------------------------------------------------------
> #
> # LAM 6.5.6/MPI 2 C++/ROMIO - University of Notre Dame
> #
> # [glitch_at_wulfpet3 glitch]$ lamboot -d -v
> #
> # LAM 6.5.6/MPI 2 C++/ROMIO - University of Notre Dame
> #
> # lamboot: boot schema file: /etc/lam/lam-bhost.def
> # lamboot: opening hostfile /etc/lam/lam-bhost.def
> # lamboot: found the following hosts:
> # lamboot: n0 localhost
> # lamboot: resolved hosts:
> # lamboot: n0 localhost --> 127.0.0.1
> # lamboot: found 1 host node(s)
> # lamboot: origin node is 0 (localhost)
> # Executing hboot on n0 (localhost - 1 CPU)...
> # lamboot: attempting to execute "hboot -t -c lam-conf.lam -d -v -I " -H
> # 127.0.0.1 -P 32835 -n 0 -o 0 ""
> # hboot: process schema = "/etc/lam/lam-conf.lam"
> # hboot: found /usr/bin/lamd
> # hboot: performing tkill
> # hboot: tkill
> # hboot: booting...
> # hboot: fork /usr/bin/lamd
> # [1] 10980 lamd -H 127.0.0.1 -P 32835 -n 0 -o 0 -d
> # hboot: attempting to execute
> # -----------------------------------------------------------------------------
> # lamboot encountered some error (see above) during the boot process,
> # and will now attempt to kill all nodes that it was previously able to
> # boot (if any).
> #
> # Please wait for LAM to finish; if you interrupt this process, you may
> # have LAM daemons still running on remote nodes.
> # -----------------------------------------------------------------------------
> # wipe ...
> #
> # LAM 6.5.6/MPI 2 C++/ROMIO - University of Notre Dame
> #
> # Executing tkill on n0 (localhost)...
> # lamboot did NOT complete successfully
> # [glitch_at_wulfpet3 glitch]$
> #
> #
> # _______________________________________________
> # This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> #
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
--
_ _ _ _ _ _ _ _
-_- -_- - -_- -_- - -_- -_- - -_- -_- -
_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/
|