MANUAL LAMBOOT °°°°°°°°°°°°°°° n-1<7037> ssi:boot:open: opening n-1<7037> ssi:boot:open: opening boot module globus n-1<7037> ssi:boot:open: opened boot module globus n-1<7037> ssi:boot:open: opening boot module rsh n-1<7037> ssi:boot:open: opened boot module rsh n-1<7037> ssi:boot:open: opening boot module slurm n-1<7037> ssi:boot:open: opened boot module slurm n-1<7037> ssi:boot:select: initializing boot module slurm n-1<7037> ssi:boot:slurm: not running under SLURM n-1<7037> ssi:boot:select: boot module not available: slurm n-1<7037> ssi:boot:select: initializing boot module rsh n-1<7037> ssi:boot:rsh: module initializing n-1<7037> ssi:boot:rsh:agent: /usr/bin/ssh n-1<7037> ssi:boot:rsh:username: n-1<7037> ssi:boot:rsh:verbose: 1000 n-1<7037> ssi:boot:rsh:algorithm: linear n-1<7037> ssi:boot:rsh:no_n: 0 n-1<7037> ssi:boot:rsh:no_profile: 0 n-1<7037> ssi:boot:rsh:fast: 0 n-1<7037> ssi:boot:rsh:ignore_stderr: 0 n-1<7037> ssi:boot:rsh:priority: 10 n-1<7037> ssi:boot:select: boot module available: rsh, priority: 10 n-1<7037> ssi:boot:select: initializing boot module globus n-1<7037> ssi:boot:globus: globus-job-run not found, globus boot will not run n-1<7037> ssi:boot:select: boot module not available: globus n-1<7037> ssi:boot:select: finalizing boot module slurm n-1<7037> ssi:boot:slurm: finalizing n-1<7037> ssi:boot:select: closing boot module slurm n-1<7037> ssi:boot:select: finalizing boot module globus n-1<7037> ssi:boot:globus: finalizing n-1<7037> ssi:boot:select: closing boot module globus n-1<7037> ssi:boot:select: selected boot module rsh LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University n-1<7037> ssi:boot:base: looking for boot schema in following directories: n-1<7037> ssi:boot:base: n-1<7037> ssi:boot:base: $TROLLIUSHOME/etc n-1<7037> ssi:boot:base: $LAMHOME/etc n-1<7037> ssi:boot:base: /usr/local/etc n-1<7037> ssi:boot:base: looking for boot schema file: n-1<7037> ssi:boot:base: /home/griduser/mpi-check-parallel/machines n-1<7037> ssi:boot:base: found boot schema: /home/griduser/mpi-check-parallel/machines n-1<7037> ssi:boot:rsh: found the following hosts: n-1<7037> ssi:boot:rsh: n0 xxx.xxx.xx.131 (cpu=2) n-1<7037> ssi:boot:rsh: resolved hosts: n-1<7037> ssi:boot:rsh: n0 xxx.xxx.xx.131 --> xxx.xxx.xx.131 (origin) n-1<7037> ssi:boot:rsh: starting RTE procs n-1<7037> ssi:boot:base:linear: starting n-1<7037> ssi:boot:base:server: opening server TCP socket n-1<7037> ssi:boot:base:server: opened port 58448 n-1<7037> ssi:boot:base:linear: booting n0 (xxx.xxx.xx.131) n-1<7037> ssi:boot:rsh: starting lamd on (xxx.xxx.xx.131) n-1<7037> ssi:boot:rsh: starting on n0 (xxx.xxx.xx.131): hboot -t -c lam-conf.lamd -d -I -H xxx.xxx.xx.131 -P 58448 -n 0 -o 0 n-1<7037> ssi:boot:rsh: launching locally hboot: performing tkill hboot: tkill -d tkill: setting prefix to (null) tkill: setting suffix to (null) tkill: got killname back: /tmp/lam-griduser@studpc11/lam-killfile tkill: f_kill = "/tmp/lam-griduser@studpc11/lam-killfile" tkill: killing LAM... tkill: killing PID (SIGHUP) 6579 ... tkill: killed tkill: killing PID (SIGHUP) 6580 ... tkill: already dead tkill: killing PID (SIGHUP) 6582 ... tkill: killed tkill: killing PID (SIGHUP) 6581 ... tkill: killed tkill: removing socket file ... tkill: socket file: /tmp/lam-griduser@studpc11/lam-kernel-socketd tkill: removing IO daemon socket file ... tkill: IO daemon socket file: /tmp/lam-griduser@studpc11/lam-io-socket tkill: all finished hboot: booting... hboot: fork /usr/local/bin/lamd hboot: attempting to execute n-1<7040> ssi:boot:open: opening n-1<7040> ssi:boot:open: opening boot module globus n-1<7040> ssi:boot:open: opened boot module globus n-1<7040> ssi:boot:open: opening boot module rsh n-1<7040> ssi:boot:open: opened boot module rsh n-1<7040> ssi:boot:open: opening boot module slurm n-1<7040> ssi:boot:open: opened boot module slurm n-1<7040> ssi:boot:select: initializing boot module slurm n-1<7040> ssi:boot:slurm: not running under SLURM n-1<7040> ssi:boot:select: boot module not available: slurm n-1<7040> ssi:boot:select: initializing boot module globus n-1<7040> ssi:boot:globus: globus-job-run not found, globus boot will not run n-1<7040> ssi:boot:select: boot module not available: globus n-1<7040> ssi:boot:select: initializing boot module rsh n-1<7040> ssi:boot:rsh: module initializing n-1<7040> ssi:boot:rsh:agent: /usr/bin/ssh n-1<7040> ssi:boot:rsh:username: n-1<7040> ssi:boot:rsh:verbose: 1000 n-1<7040> ssi:boot:rsh:algorithm: linear n-1<7040> ssi:boot:rsh:no_n: 0 n-1<7040> ssi:boot:rsh:no_profile: 0 n-1<7040> ssi:boot:rsh:fast: 0 n-1<7040> ssi:boot:rsh:ignore_stderr: 0 n-1<7040> ssi:boot:rsh:priority: 10 n-1<7040> ssi:boot:select: boot module available: rsh, priority: 10 n-1<7040> ssi:boot:select: finalizing boot module slurm n-1<7040> ssi:boot:slurm: finalizing n-1<7040> ssi:boot:select: closing boot module slurm n-1<7040> ssi:boot:select: finalizing boot module globus n-1<7040> ssi:boot:globus: finalizing n-1<7040> ssi:boot:select: closing boot module globus n-1<7040> ssi:boot:select: selected boot module rsh n-1<7040> ssi:boot:send_lamd: getting node ID from command line n-1<7040> ssi:boot:send_lamd: getting agent haddr from command line n-1<7040> ssi:boot:send_lamd: getting agent port from command line n-1<7040> ssi:boot:send_lamd: getting node ID from command line n-1<7040> ssi:boot:send_lamd: connecting to xxx.xxx.xx.131:58448, node id 0 n-1<7040> ssi:boot:send_lamd: sending dli_port 47649 [1] 7040 lamd -H xxx.xxx.xx.131 -P 58448 -n 0 -o 0 -d n-1<7037> ssi:boot:rsh: successfully launched on n0 (xxx.xxx.xx.131) n-1<7037> ssi:boot:base:server: expecting connection from finite list n-1<7037> ssi:boot:base:server: got connection from xxx.xxx.xx.131 n-1<7037> ssi:boot:base:server: this connection is expected (n0) n-1<7037> ssi:boot:base:server: remote lamd is at xxx.xxx.xx.131:47649 n-1<7037> ssi:boot:base:server: closing server socket n-1<7037> ssi:boot:base:server: connecting to lamd at xxx.xxx.xx.131:38472 n-1<7037> ssi:boot:base:server: connected n-1<7037> ssi:boot:base:server: sending number of links (1) n-1<7037> ssi:boot:base:server: sending info: n0 (xxx.xxx.xx.131) n-1<7037> ssi:boot:base:server: finished sending n-1<7037> ssi:boot:base:server: disconnected from xxx.xxx.xx.131:38472 n-1<7037> ssi:boot:base:linear: finished n-1<7037> ssi:boot:rsh: all RTE procs started n-1<7037> ssi:boot:rsh: finalizing n-1<7037> ssi:boot: Closing n-1<7040> ssi:boot:rsh: finalizing n-1<7040> ssi:boot: Closing ________________________________________________________________________________________________________ WRAPPER - LAMBOOT °°°°°°°°°°°°°°°°° n-1<6045> ssi:boot:open: opening n-1<6045> ssi:boot:open: opening boot module globus n-1<6045> ssi:boot:open: opened boot module globus n-1<6045> ssi:boot:open: opening boot module rsh n-1<6045> ssi:boot:open: opened boot module rsh n-1<6045> ssi:boot:open: opening boot module slurm n-1<6045> ssi:boot:open: opened boot module slurm n-1<6045> ssi:boot:select: initializing boot module slurm n-1<6045> ssi:boot:slurm: not running under SLURM n-1<6045> ssi:boot:select: boot module not available: slurm n-1<6045> ssi:boot:select: initializing boot module rsh n-1<6045> ssi:boot:rsh: module initializing n-1<6045> ssi:boot:rsh:agent: /usr/local/bin n-1<6045> ssi:boot:rsh:username: n-1<6045> ssi:boot:rsh:verbose: 1000 n-1<6045> ssi:boot:rsh:algorithm: linear n-1<6045> ssi:boot:rsh:no_n: 0 n-1<6045> ssi:boot:rsh:no_profile: 0 n-1<6045> ssi:boot:rsh:fast: 1 n-1<6045> ssi:boot:rsh:ignore_stderr: 0 n-1<6045> ssi:boot:rsh:priority: 10 n-1<6045> ssi:boot:select: boot module available: rsh, priority: 10 n-1<6045> ssi:boot:select: initializing boot module globus n-1<6045> ssi:boot:globus: globus-job-run not found, globus boot will not run n-1<6045> ssi:boot:select: boot module not available: globus n-1<6045> ssi:boot:select: finalizing boot module slurm n-1<6045> ssi:boot:slurm: finalizing n-1<6045> ssi:boot:select: closing boot module slurm n-1<6045> ssi:boot:select: finalizing boot module globus n-1<6045> ssi:boot:globus: finalizing n-1<6045> ssi:boot:select: closing boot module globus n-1<6045> ssi:boot:select: selected boot module rsh n-1<6045> ssi:boot:base: looking for boot schema in following directories: n-1<6045> ssi:boot:base: n-1<6045> ssi:boot:base: $TROLLIUSHOME/etc n-1<6045> ssi:boot:base: $LAMHOME/etc n-1<6045> ssi:boot:base: /usr/local/etc n-1<6045> ssi:boot:base: looking for boot schema file: n-1<6045> ssi:boot:base: machines n-1<6045> ssi:boot:base: found boot schema: machines n-1<6045> ssi:boot:rsh: found the following hosts: n-1<6045> ssi:boot:rsh: n0 studpc11 (cpu=2) ----------------------------------------------------------------------------- The boot SSI rsh module found that your local host is not in the hostfile "machines". The local host name *must* be in the list of hosts in the hostfile. In other words, you must boot LAM from a node that will be part of the universe. - If you simply forgot to put the local host in the boot schema file, add it and re-run The boot SSI rsh module - If you are trying to boot LAM from a node that will not be part of the universe, you must login to on of the nodes that will be part of the universe (i.e., one of the nodes in the hostfiles), and re-run The boot SSI rsh module Although the local host name is usually the first in the list to avoid I/O ambiguities, it can actually appear anywhere in the list.