LAM/MPI logo

LAM/MPI Development Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Liu Xuezhao (lxz_at_[hidden])
Date: 2006-01-09 20:29:32


Hi,

        Let me try to answer your question ;)

======= 2006-01-09 11:18:00 Yuan Tang Wrote£º=======

>Hi Jeff & Liu,
>
>I downloaded the lam-7.1.2b30 and installed it. But there're 2 problems:
>
>1. If the lamd exited when I invoked "cr_checkpoint --term
>${pid_mpirun}", then the "cr_restart context.{pid_mpirun}" could not
>restart the whole program.
It is in gear. Lam-mpi's execution depends on the lam RunTime Environment(RTE), the RTE state is managed by the lamd process. So the execution can't be restarted if the lamd exeted.
>
>2. Even the lamd doesnot exit, if I invoke "cr_checkpoint --term
>${pid_mpirun}" multiple times, the "cr_restart" will always restart the
>program from the 1st/earliest checkpoint, which means the subsequent
>checkpoint doesn't take any effect. Actually, if I delete the
>context.${pid_mpirun} during the run of application, I found the
>subsequent cr_checkpoint --term ${pid_mpirun} doesnot generate any
>checkpoint file any more. Why?
If you assign cr_checkpoint with the "--term" option, a "SIGTERM" signal will send to all processes/threads in the lam universe, and the execution will abort. Your subsequent "cr_checkpoint --term ${pid_mpirun}" can't find the corresponding process to checkpoint, but it should print a msg like " No such process".
>
>Normally, re-invoke the "cr_checkpoint ${pid_mpirun}" will cause a
>signal 11 -- SIGSEGV
I have not met this problem,;)
>
>BTW: in lam/mpi source code, what "MPI_COMM_SPAWN'ed processes" means?
I think the MPI_COMM_SPAWN is correspond to the dynamic process produce of MPI-2 standard. In MPI-2, there are a new API - MPI_COMM_SPAWN, it can spawn several new processes to form a new process group.
It seems that LAM-MPI do not support the MPI_COMM_SPAWN'ed processes to be checkpointable. But i am not sure, I am not familiar with it, ;)

>Thanks!
>
>Yuan
>
======================================================