LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: AN U (anu.tech_at_[hidden])
Date: 2006-05-25 04:06:30


I tried to checkpoint mpi programs but it failed. The details of the problems are
 
 I first installed blcr-0.4.2 with default installation method then Lam-7.0.6 with
 
 ./configure --with-rsh="ssh -x" --prefix=/opt/lam-7.0.6
 make
 make install
 
 
 The problem I am getting is when I run a single cpu job it runs and I am able to checkpoint it using
 
  cr_checkpoint -v <pid of prog-name>
 
 
 but when running a multi-processor job and tried to checkpoint it using
 
   mpirun -np 8 --ssi cr blcr [ .. some more options ..] xhpl
 
  cr_checkpoint -v --term <pid of mpirun>
 
>>
      checkpoint request issued
      checkpoint request completed
      ioctl(/proc/checkpoint/ctrl, CR_OP_CHKPT_REAP): No such process
 
 
 When tried to checkpoint it using <pid of prog name> the prog simply hangs.
 
 Can some one point me out what is wrong?
 
 Thanks for your time
 
 Anu
 
                
---------------------------------
Yahoo! Messenger with Voice. Make PC-to-Phone Calls to the US (and 30+ countries) for 2¢/min or less.