I have successfully installed lam-mpi under macos 10.4.5 using the pre-built
binary installer. With a single machine (2 cpus), things work just fine,
but trying to add a second machine is problematic. Here are some details:
holmes:~ sdavis$ recon -v bootschema
n-1<23438> ssi:boot:base:linear: booting n0 (holmes.nhgri.nih.gov)
n-1<23438> ssi:boot:base:linear: booting n1 (watson.nhgri.nih.gov)
n-1<23438> ssi:boot:base:linear: finished
----------------------------------------------------------------------------
-
Woo hoo!
....
And a bit of the lamboot -d -v output (more is available if needed):
n-1<23452> ssi:boot:rsh: remote shell /bin/tcsh
n-1<23452> ssi:boot:rsh: attempting to execute: ssh -X watson.nhgri.nih.gov
-n h
boot -t -c lam-conf.lamd -d -v -s -I '"-H 128.231.145.14 -P 54389 -n 1 -o
0"'
tkill: setting prefix to (null)
tkill: setting suffix to (null)
tkill: got killname back: /tmp/lam-sdavis_at_[hidden]/lam-killfile
tkill: f_kill = "/tmp/lam-sdavis_at_[hidden]/lam-killfile"
tkill: nothing to kill: "/tmp/lam-sdavis_at_[hidden]/lam-killfile"
hboot: performing tkill
hboot: tkill -d
hboot: booting...
hboot: fork /usr/local/bin/lamd
[1] 3628 lamd -H 128.231.145.14 -P 54389 -n 1 -o 0 -d
n-1<23452> ssi:boot:rsh: successfully launched on n1 (watson.nhgri.nih.gov)
n-1<23452> ssi:boot:base:server: expecting connection from finite list
----------------------------------------------------------------------------
-
The lamboot agent timed out while waiting for the newly-booted process
to call back and indicated that it had successfully booted.
And finally, the telnet command that the output says I should try:
holmes:~ sdavis$ telnet 128.231.145.14 54389
Trying 128.231.145.14...
telnet: connect to address 128.231.145.14: Connection refused
telnet: Unable to connect to remote host
holmes:~ sdavis$
Thanks,
Sean
|