LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Tong_Liu_at_[hidden]
Date: 2005-02-07 12:34:56


I did BLCR+LAM/MPI installation as followed:
1. BLCR-0.3.1 compiled on all cluster nodes.
2. Recompiled LAM/MPI with BLCR support on Master node only.

Then laminfo shows that BLCR has been built to LAM. And I did insmod
blcr.o, vmdump.o on all nodes too.

I can successfully run mpi application with command:
# Mpirun -np 4 -ssi rpi rctcp -ssi blcr application
# cr_checkpoint PID of mpirun
Context file was generated, but when I restarted that application with
cr_restart command, it shows file description error.

Can any body give me some suggestion? Should I recompile LAM/MPI on all
nodes? Does command cr_checkpoint will generate context file on all
nodes or only master node? Should the directory which saves conext file
be shared mounted directory?

Thanks

Tong
Dell