LAM/MPI logo

LAM/MPI Development Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Yuan Tang (yuantang_at_[hidden])
Date: 2006-01-03 14:47:28


Dear LAM developer/maintainer,

I am now installing LAM/MPI + BLCR on my linux cluster box. After
installed BLCR, everything seems OK. Also, I checked the
cr_checkpoint/cr_restart with some small C tester, it works fine. But
when I compiled some MPI program with LAM, the cr_checkpoint
--term/cr_restart doesnot work any more.
i.e.
First, I invoke the application run with:
lamboot ./hostf_lam
/home/yuantang/local/bin/mpirun C -ssi rpi crtcp -ssi cr blcr -x
LD_LIBRARY_PATH ${prog}

Then I invoke :
>cr_checkpoint --term $PID_of_mpirun

It will generate the file : context.$PID_of_mpirun under current directory.
Then I invoke :
>cr_restart $PID_of_mpirun
It report:
>mpirun (rpwait): Bad file descriptor
and exit.

My LAM configuration line is as follows:
./configure --prefix=/home/yuantang/local --with-rpi=crtcp
--with-threads=posix --with-wrapper-extra-ldflags
--with-cr-blcr=/home/yuantang/local --with-cr-base-file-dir=/tmp

Would you give me some hints what might be wrong??

Thanks!

Yuan