LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Brian Barrett (brbarret_at_[hidden])
Date: 2006-12-12 23:34:32


In the output included in the first e-mail, the lamd found is /sw/bin/
lamd. Since /sw/bin is usually where Fink puts stuff, I assume that
means that /sw/bin/lamd is the Fink version of lamd and not the one
you had built. This could mean that Fink is first in your path when
the non-interactive shell used by ssh is started. You can check this
by running:

   ssh localhost which lamd

If it points to /sw/bin/lamd instead of the path to your custom LAM
build, then that's the problem. If you fix your path so that your
build is first in the path for non-interactive logins, I would be
willing to bet your problem goes away.

Brian

On Dec 8, 2006, at 1:20 AM, Roger Smith wrote:

> Brian, in fact this is how I built lam from your download not from
> fink direct although I had the fink version already so I just wonder
> if there is some sort of conflict somewhere. Anyway I still cannot
> get lamboot to work. Any suggestions would be helpful
> thanks,
>
>
> Roger Smith
> Loughborough UK
> ----------------------------
>
> On 30 Nov 2006, at 03:24, Brian Barrett wrote:
>> I'm not familiar with how LAM is being built by Fink or whatever
>> system you used to build Open MPI. This is an error I've seen from
>> time to time if the LAM daemon is dynamically linked to liblam
>> instead of statically linked. I'd recommend using the build of LAM
>> for OS X found on our web page:
>>
>> http://www.lam-mpi.org/7.1/download.php
>>
>> Brian
>>
>>
>> On Nov 28, 2006, at 1:22 AM, Roger Smith wrote:
>>
>>> I am running Mac OS 10.4.8 on a dual processor PowerPC G5 with
>>> 2.5 GB
>>> ram. I have installed LAM 7.1.2 through the desk manager system. I
>>> wish to use MPI on this single dual processor machine.
>>> However, I cannot now get lamboot to work with the newer version of
>>> lam even after running recon. It comes up with the error
>>>
>>> router (nrecv): not attached to daemon
>>>
>>> when I run with lamboot -d I get the output
>>>
>>>
>>> --------------------------------------------------------------------
>>> -
>>> -
>>> --
>>> ------
>>> LAM 7.1.2/MPI 2 C++/ROMIO - Indiana University
>>>
>>> n-1<29718> ssi:boot:base: looking for boot schema in following
>>> directories:
>>> n-1<29718> ssi:boot:base: <current directory>
>>> n-1<29718> ssi:boot:base: $TROLLIUSHOME/etc
>>> n-1<29718> ssi:boot:base: $LAMHOME/etc
>>> n-1<29718> ssi:boot:base: /sw/etc/lammpi
>>> n-1<29718> ssi:boot:base: looking for boot schema file:
>>> n-1<29718> ssi:boot:base: lam-bhost.def
>>> n-1<29718> ssi:boot:base: found boot schema: /sw/etc/lammpi/lam-
>>> bhost.def
>>> n-1<29718> ssi:boot:rsh: found the following hosts:
>>> n-1<29718> ssi:boot:rsh: n0 localhost (cpu=1)
>>> n-1<29718> ssi:boot:rsh: resolved hosts:
>>> n-1<29718> ssi:boot:rsh: n0 localhost --> 127.0.0.1 (origin)
>>> n-1<29718> ssi:boot:rsh: starting RTE procs
>>> n-1<29718> ssi:boot:base:linear: starting
>>> n-1<29718> ssi:boot:base:server: opening server TCP socket
>>> n-1<29718> ssi:boot:base:server: opened port 53756
>>> n-1<29718> ssi:boot:base:linear: booting n0 (localhost)
>>> n-1<29718> ssi:boot:rsh: starting lamd on (localhost)
>>> n-1<29718> ssi:boot:rsh: starting on n0 (localhost): hboot -t -c
>>> lam-
>>> conf.lamd -d -I -H 127.0.0.1 -P 53756 -n 0 -o 0
>>> n-1<29718> ssi:boot:rsh: launching locally
>>> hboot: performing tkill
>>> hboot: tkill -d
>>> tkill: setting prefix to (null)
>>> tkill: setting suffix to (null)
>>> tkill: got killname back: /tmp/lam-mars_at_[hidden]/lam-
>>> killfile
>>> tkill: f_kill = "/tmp/lam-mars_at_[hidden]/lam-killfile"
>>> tkill: killing LAM...
>>> tkill: killing PID (SIGHUP) 29715 ...
>>> tkill: already dead
>>> tkill: removing socket file ...
>>> tkill: socket file: /tmp/lam-mars_at_[hidden]/lam-kernel-
>>> socketd
>>> tkill: removing IO daemon socket file ...
>>> tkill: IO daemon socket file: /tmp/lam-mars_at_[hidden]/lam-
>>> io-
>>> socket
>>> tkill: all finished
>>> hboot: booting...
>>> hboot: fork /sw/bin/lamd
>>> [1] 29721 lamd -H 127.0.0.1 -P 53756 -n 0 -o 0 -d
>>> n-1<29718> ssi:boot:rsh: successfully launched on n0 (localhost)
>>> hboot: attempting to execute
>>> n-1<29718> ssi:boot:base:server: expecting connection from finite
>>> list
>>> router (nrecv): not attached to daemon
>>>
>>> -----------------------------------------------------
>>>
>>> I would appreciate any help anyone can give me on this problem
>>>
>>>
>>>
>>> regards to all
>>>
>>>
>>> Roger Smith
>>> Loughborough University UK
>>>
>>>
>>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/