On Mar 29, 2007, at 4:01 AM, Van-Khiem Truong wrote:
> I would like to get help for the "lamboot" procedure. I have
> installed the code LAM-MPI on two machines HP-UX, the first one is a
> PA-Risc 2.0, the second
> one is a multiprocessor HP Itanium.
>
> The installation seems to be fine, except for the module ptmalloc2
> (/share/memory/ptmalloc2) where I need to change the "Makefile " to
> remove the file
> malloc.c, otherwise the code tells me that variables are already
> declared.
You should configure with --without-memory-manager and then you won't
have this problem.
> So on the machine HP PA-Risc , I can start the procedure
> "lamboot"
> and connect to another PA-Risc HP. However for the machine HP
> multiprocessor, it tells me that it boots but the call back doesn't
> work. I attach hereby the file containing the error message.
See below.
> [snip]
> n-1<1698> ssi:boot:base: looking for boot schema file:
> n-1<1698> ssi:boot:base: hostfile
> n-1<1698> ssi:boot:base: found boot schema: hostfile
> n-1<1698> ssi:boot:rsh: found the following hosts:
> n-1<1698> ssi:boot:rsh: n0 nanopus (cpu=1) n-1<1698> ssi:boot:rsh:
> n1 hudson (cpu=1) n-1<1698> ssi:boot:rsh: resolved hosts:
> n-1<1698> ssi:boot:rsh: n0 nanopus --> 125.1.5.218 (origin)
> n-1<1698> ssi:boot:rsh: n1 hudson --> 125.1.7.17
> [snip]
> n-1<1698> ssi:boot:rsh: starting on n0 (nanopus): hboot -t -c
> lam-conf.lamd -d -v -I -H 125.1.5.218 -P 49939 -n 0 -o 0
> n-1<1698> ssi:boot:rsh: launching locally
> [snip]
> hboot: attempting to execute [1] 1701 lamd -H 125.1.5.218 -P
> 49939 -n
> 0 -o 0 -d
> n-1<1698> ssi:boot:rsh: successfully launched on n0 (nanopus)
> n-1<1698> ssi:boot:base:server: expecting connection from finite list
> ----------------------------------------------------------------------
> -------
> The lamboot agent timed out while waiting for the newly-booted process
> to call back and indicated that it had successfully booted.
> [snip]
What is truly odd here is that the lamd that lamboot is waiting for
is the *local* lamd.
Did you check that you have no TCP filtering / firewall software
running?
--
Jeff Squyres
Cisco Systems
|