On Fri, 2006-06-30 at 16:10 +0530, shravya reddy wrote:
> I have some queries regarding lam-mpi.
>
> 1) In Lam-Mpi in file otb/sys/lamd/lamd_main.c
> lam daemons are calling run_kernel() function...so I want to
> ask how is the interaction between lam daemons and
> mpirun taking place.
> And how is the similar thing happening in open mpi.
>
> 2) What is the significance of daemons in the process of
> checkpointing....as fas as I can figure out daemons are used only to
> fork application processes and to catch the SIGCHLD signal....
All processes in the LAM environment communicate through a unix domain
socket created by the LAM kernel process at lamboot / startup time. All
LAM communication (MPI traffic can have it's own channels) is routed
though the LAM daemons.
The daemons are basically not involved in the checkpoint/restart code,
other than forwarding signals. We have a paper on our web site about
our implementation of checkpoint/restart. It should have all the
details you need about what we implemented.
Brian
|