LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Brian Barrett (brbarret_at_[hidden])
Date: 2007-12-16 22:28:21


On Dec 3, 2007, at 6:15 PM, Mukuntan wrote:

> I am implementing a checkpoint replication and process migration
> scheme in LAM using BLCR. I went through previous threads discussing
> migration and found a lot of useful information. I would like to
> thank the LAM developers and other members for their useful inputs.
>
> I have an issue in my implementation. When I attempt to start a
> process on another node, it does not attach to the lam daemon there.
> The exact errror it throws is 'no LAM daemon found'. This call to
> kenter occurs before the process receives any GPS information from
> mpirun, and before any of the gps updates for migration can be done.
> Why is it unable to attach to the lam daemon of the node it is
> migrated to?

Sorry about the delay in replying. There's only me these days, and I
don't really have much time for LAM anymore.

That error usually means for some reason the path to the unix domain
socket that the LAMD is using doesn't match the path that the
application thinks the LAMD is using. You might want to attach a
debugger to your restarted process and see what path it's come up with
for the unix domain socket. My guess is it won't match where the
socket really is (which will be $TMPDIR/lam-<user>@<hostname>/lam-
kernel-socket).

Brian

-- 
   Brian Barrett
   LAM/MPI Developer
   Make today a LAM/MPI day!