LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Pirabhu Raman (pirabhur_at_[hidden])
Date: 2004-02-19 00:48:55


Hi,

I am trying to use lam with BLCR. I first installed BLCR with the following
commands
configure
make
make install
Since I did not specify prefix option BLCR was installed in default
/usr/local folder. Then I installed lam with commands
configure --with-blcr=/usr/local --with-rpi-crtcp

Now when I do check point of ordinary processes using blcr it works fine. I
started lamboot and then I invoked a parallel process with command
mpirun -ssi rpi crtcp -ssi cr blcr -np 4 ./ring
This produces error stating blcr module in CR kind was not found. This
typically means you have misspelled the module name.

So I ran the program with command
mpirun -ssi rpi crtcp -np 4 ./ring
and it works fine. Now I checkpoint with the command
cr_checkpoint 23245 where 23245 is PID of mpirun. One file named
context.23245 is created and no other files are created (Should other files
be created). This file is created on node where I run command cr_checkpoint.
(Note I don't have NFS on my test cluster)

When I try to restart the original program from context file with command
cr_restart 23245 I get the error
mpirun (rpwait) : bad file descriptor. (Note: The original process has
already completed execution)

Please let me know if these errors are due to some lapse in installation or
if I am missing some options.

Thanks in Advance,
Pirabhu

_________________________________________________________________
Masterpieces made affordable! Buy art prints.
http://go.msnserver.com/IN/42736.asp MSN Shopping.