LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2004-11-20 07:05:39


Note that -prefix is only supported in the rsh and globus boot SSI's;
it is not supported in the pbs boot SSI.

PBS is a little different than rsh/ssh environments. PBS essentially
assumes that you have homogeneous systems and pushes a copy of your
environment to all nodes where you start tasks. Hence, the best way to
use multiple LAM installations in a PBS environment is to set your
$PATH appropriately before you lamboot. You might want to read the TM
boot module section of the LAM/MPI User's Guide for more information.

If you have LAM installed in different locations on the nodes in your
PBS cluster, things may not work properly.

On Nov 18, 2004, at 12:53 PM, Borenstein, Bernard S wrote:

> The new -prefix command on lamboot seems to be a new powerfull feature
> to use when you want to run multiple lam
> installations. I built a recent beta lam and tried to use the -prefix
> command and got this error :
>
> My lamboot command is as follows and I'm running using PBS :
>
> lamboot -d -prefix /fltapps/boeing/cfd/mpi/lam7.1.1_pgf
>
> Here is the debug output :
>
> n-1<30129> ssi:boot:open: opening
> n-1<30129> ssi:boot:open: opening boot module globus
> n-1<30129> ssi:boot:open: opened boot module globus
> n-1<30129> ssi:boot:open: opening boot module rsh
> n-1<30129> ssi:boot:open: opened boot module rsh
> n-1<30129> ssi:boot:open: opening boot module slurm
> n-1<30129> ssi:boot:open: opened boot module slurm
> n-1<30129> ssi:boot:open: opening boot module tm
> n-1<30129> ssi:boot:open: opened boot module tm
> n-1<30129> ssi:boot:select: initializing boot module tm
> n-1<30129> ssi:boot:tm: module initializing
> n-1<30129> ssi:boot:tm:verbose: 1000
> n-1<30129> ssi:boot:tm:priority: 75
> n-1<30129> ssi:boot:select: boot module available: tm, priority: 75
> n-1<30129> ssi:boot:select: initializing boot module slurm
> n-1<30129> ssi:boot:slurm: not running under SLURM
> n-1<30129> ssi:boot:select: boot module not available: slurm
> n-1<30129> ssi:boot:select: initializing boot module rsh
> n-1<30129> ssi:boot:rsh: module initializing
> n-1<30129> ssi:boot:rsh:agent: rsh
> n-1<30129> ssi:boot:rsh:username: <same>
> n-1<30129> ssi:boot:rsh:verbose: 1000
> n-1<30129> ssi:boot:rsh:algorithm: linear
> n-1<30129> ssi:boot:rsh:no_n: 0
> n-1<30129> ssi:boot:rsh:no_profile: 0
> n-1<30129> ssi:boot:rsh:fast: 0
> n-1<30129> ssi:boot:rsh:ignore_stderr: 0
> n-1<30129> ssi:boot:rsh:priority: 10
> n-1<30129> ssi:boot:select: boot module available: rsh, priority: 10
> n-1<30129> ssi:boot:select: initializing boot module globus
> n-1<30129> ssi:boot:globus: globus-job-run not found, globus boot
> will not run
> n-1<30129> ssi:boot:select: boot module not available: globus
> n-1<30129> ssi:boot:select: finalizing boot module slurm
> n-1<30129> ssi:boot:slurm: finalizing
> n-1<30129> ssi:boot:select: closing boot module slurm
> n-1<30129> ssi:boot:select: finalizing boot module rsh
> n-1<30129> ssi:boot:rsh: finalizing
> n-1<30129> ssi:boot:select: closing boot module rsh
> n-1<30129> ssi:boot:select: finalizing boot module globus
> n-1<30129> ssi:boot:globus: finalizing
> n-1<30129> ssi:boot:select: closing boot module globus
> n-1<30129> ssi:boot:select: selected boot module tm
> n-1<30129> ssi:boot:tm: found the following 4 hosts:
> n-1<30129> ssi:boot:tm:   n0 hsd354 (cpu=2)
> n-1<30129> ssi:boot:tm:   n1 hsd352 (cpu=2)
> n-1<30129> ssi:boot:tm:   n2 hsd351 (cpu=2)
> n-1<30129> ssi:boot:tm:   n3 hsd350 (cpu=2)
> n-1<30129> ssi:boot:tm: starting RTE procs
> n-1<30129> ssi:boot:base:linear_windowed: starting
> n-1<30129> ssi:boot:base:linear_windowed: window size: 5
> n-1<30129> ssi:boot:base:server: opening server TCP socket
> n-1<30129> ssi:boot:base:server: opened port 35051
> n-1<30129> ssi:boot:base:linear_windowed: booting n0 (hsd354)
> n-1<30129> ssi:boot:tm: starting wipe on (hsd354)
> n-1<30129> ssi:boot:tm: starting on n0 (hsd354):
> /fltapps/boeing/cfd/mpi/lam7.1.1_pgf/bin/tkill -setsid -d
> n-1<30129> ssi:boot:tm: successfully launched on n0 (hsd354)
> n-1<30129> ssi:boot:tm: waiting for completion on n0 (hsd354)
> n-1<30129> ssi:boot:tm: finished on n0 (hsd354)
> n-1<30129> ssi:boot:tm: starting lamd on (hsd354)
> base: cannot open lam-conf.lamd: No such file or directory
>
> -----------------------------------------------------------------------
> ------
> hboot could not parse the boot configuration file.  A number
> of problems can result in this error messages:
>
>   - Is the configuration file installed properly?
>   - Did you specify a file name that does not exit when
>     using the -c option to lamboot?
>
> -----------------------------------------------------------------------
> ------
>
> -----------------------------------------------------------------------
> ------
> It seems that there is no lamd running on the host hsd354.
>
> This indicates that the LAM/MPI runtime environment is not operating.
> The LAM/MPI runtime environment is necessary for the "lamhalt"
> command.
>
> Please run the "lamboot" command the start the LAM/MPI runtime
> environment.  See the LAM/MPI documentation for how to invoke
> "lamboot" across multiple machines.
>
> -----------------------------------------------------------------------
> ------
>
> The file lam-conf.lamd exists in my LAMHOME/etc directory.  Please
> note that the job runs if I remove
> the -prefix from my lamboot command.
>
>
>
> My laminfo -all
>
>             LAM/MPI: 7.2b1svn9913
>             SSI boot: globus (SSI v1.0, API v1.1, Module v0.6)
>             SSI boot: rsh (SSI v1.0, API v1.1, Module v1.1)
>             SSI boot: slurm (SSI v1.0, API v1.1, Module v1.0)
>             SSI boot: tm (SSI v1.0, API v1.1, Module v1.1)
>             SSI coll: lam_basic (SSI v1.0, API v1.1, Module v7.1)
>             SSI coll: shmem (SSI v1.0, API v1.1, Module v1.0)
>             SSI coll: smp (SSI v1.0, API v1.1, Module v1.2)
>              SSI rpi: crtcp (SSI v1.0, API v1.1, Module v1.1)
>              SSI rpi: lamd (SSI v1.0, API v1.0, Module v7.1)
>              SSI rpi: sysv (SSI v1.0, API v1.0, Module v7.1)
>              SSI rpi: tcp (SSI v1.0, API v1.0, Module v7.1)
>              SSI rpi: usysv (SSI v1.0, API v1.0, Module v7.1)
>               SSI cr: self (SSI v1.0, API v1.0, Module v1.0)
>               Prefix: /fltapps/boeing/cfd/mpi/lam7.1.1_pgf
>               Bindir: /fltapps/boeing/cfd/mpi/lam7.1.1_pgf/bin
>               Libdir: /fltapps/boeing/cfd/mpi/lam7.1.1_pgf/lib
>               Incdir: /fltapps/boeing/cfd/mpi/lam7.1.1_pgf/include
>            Pkglibdir: /fltapps/boeing/cfd/mpi/lam7.1.1_pgf/lib/lam
>           Sysconfdir: /fltapps/boeing/cfd/mpi/lam7.1.1_pgf/etc
>         Architecture: i686-pc-linux-gnu
>        Configured by: borensbs
>        Configured on: Tue Nov 16 07:25:19 PST 2004
>       Configure host: li13200
>       Memory manager: none
>           C bindings: yes
>         C++ bindings: yes
>     Fortran bindings: yes
>           C compiler: pgcc
>          C char size: 1
>          C bool size: 1
>         C short size: 2
>           C int size: 4
>          C long size: 4
>         C float size: 4
>        C double size: 8
>       C pointer size: 4
>         C char align: 1
>         C bool align: 1
>          C int align: 4
>        C float align: 4
>       C double align: 8
>         C++ compiler: pgCC
>     Fortran compiler: pgf77
>      Fortran symbols: underscore
>    Fort integer size: 4
>       Fort real size: 4
>   Fort dbl prec size: 4
>       Fort cplx size: 4
>   Fort dbl cplx size: 4
>   Fort integer align: 4
>      Fort real align: 4
>  Fort dbl prec align: 4
>      Fort cplx align: 4
>  Fort dbl cplx align: 4
>          C profiling: yes
>        C++ profiling: yes
>    Fortran profiling: yes
>       C++ exceptions: no
>       Thread support: yes
>        ROMIO support: yes
>         IMPI support: no
>        Debug support: no
>         Purify clean: no
>             SSI base: parameter "verbose" (default value: <none>)
>              SSI mpi: parameter "mpi_hostmap" (default value:
>                      
> "/fltapps/boeing/cfd/mpi/lam7.1.1_pgf/etc/lam-hostmap.txt")
>             SSI base: parameter "base_module_path" (default value:
>                       "/fltapps/boeing/cfd/mpi/lam7.1.1_pgf/lib/lam")
>             SSI boot: parameter "boot_verbose" (default value: <none>)
>             SSI boot: parameter "boot" (default value: <none>)
>             SSI boot: parameter "boot_base_promisc" (default value:
> "0")
>             SSI boot: parameter "boot_base_window_size" (default
> value: "5")
>             SSI boot: parameter "boot_globus_priority" (default
> value: "3")
>             SSI boot: parameter "boot_rsh_username" (default value:
> <none>)
>             SSI boot: parameter "boot_rsh_agent" (default value:
> "rsh")
>             SSI boot: parameter "boot_rsh_no_n" (default value: "0")
>             SSI boot: parameter "boot_rsh_no_profile" (default value:
> "0")
>             SSI boot: parameter "boot_rsh_fast" (default value: "0")
>             SSI boot: parameter "boot_rsh_ignore_stderr" (default
> value: "0")
>             SSI boot: parameter "boot_rsh_priority" (default value:
> "10")
>             SSI boot: parameter "boot_slurm_priority" (default value:
> "50")
>             SSI boot: parameter "boot_tm_priority" (default value:
> "75")
>             SSI boot: parameter "boot_tm_first" (default value: "-1")
>              SSI rpi: parameter "rpi_verbose" (default value: <none>)
>              SSI rpi: parameter "rpi" (default value: <none>)
>              SSI rpi: parameter "rpi_crtcp_priority" (default value:
> "25")
>              SSI rpi: parameter "rpi_crtcp_short" (default value:
> "65536")
>              SSI rpi: parameter "rpi_crtcp_sockbuf" (default value:
> "-1")
>              SSI rpi: parameter "rpi_lamd_priority" (default value:
> "20")
>              SSI rpi: parameter "rpi_sysv_pollyield" (default value:
> "1")
>              SSI rpi: parameter "rpi_sysv_poolsize" (default value:
>                       "16777216")
>              SSI rpi: parameter "rpi_sysv_maxalloc" (default value:
>                       "1048576")
>              SSI rpi: parameter "rpi_sysv_short" (default value:
> "8192")
>              SSI rpi: parameter "rpi_tcp_short" (default value:
> "65536")
>              SSI rpi: parameter "rpi_tcp_sockbuf" (default value:
> "-1")
>              SSI rpi: parameter "rpi_sysv_priority" (default value:
> "30")
>              SSI rpi: parameter "rpi_tcp_priority" (default value:
> "20")
>              SSI rpi: parameter "rpi_usysv_readlockpoll" (default
> value:
>                       "10000")
>              SSI rpi: parameter "rpi_usysv_writelockpoll" (default
> value:
>                       "10")
>              SSI rpi: parameter "rpi_usysv_pollyield" (default value:
> "1")
>              SSI rpi: parameter "rpi_usysv_poolsize" (default value:
>                       "16777216")
>              SSI rpi: parameter "rpi_usysv_maxalloc" (default value:
>                       "1048576")
>              SSI rpi: parameter "rpi_usysv_short" (default value:
> "8192")
>              SSI rpi: parameter "rpi_usysv_priority" (default value:
> "40")
>             SSI coll: parameter "coll_verbose" (default value: <none>)
>             SSI coll: parameter "coll_shmem" (default value: "0")
>               SSI cr: parameter "cr_verbose" (default value: <none>)
>               SSI cr: parameter "cr" (default value: <none>)
>               SSI cr: parameter "cr_self_priority" (default value:
> "25")
>               SSI cr: parameter "cr_self_do_restart" (default value:
> "0")
>               SSI cr: parameter "cr_self_prefix" (default value:
>                       "lam_cr_self")
>               SSI cr: parameter "cr_self_checkpoint" (default value:
> <none>)
>               SSI cr: parameter "cr_self_continue" (default value:
> <none>)
>               SSI cr: parameter "cr_self_restart" (default value:
> <none>)
>
>
>
> Thanx for a great product.
>
> Bernie Borenstein
> The Boeing Company
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/