> I did BLCR+LAM/MPI installation as followed:
> 1. BLCR-0.3.1 compiled on all cluster nodes.
> 2. Recompiled LAM/MPI with BLCR support on Master node only.
>
> Then laminfo shows that BLCR has been built to LAM. And I did insmod
> blcr.o, vmdump.o on all nodes too.
>
> I can successfully run mpi application with command:
> # Mpirun -np 4 -ssi rpi rctcp -ssi blcr application
> # cr_checkpoint PID of mpirun
> Context file was generated, but when I restarted that application with
> cr_restart command, it shows file description error.
>
> Can any body give me some suggestion? Should I recompile LAM/MPI on all
> nodes? Does command cr_checkpoint will generate context file on all
> nodes or only master node? Should the directory which saves conext file
> be shared mounted directory?
>
getting the same problem here iwth lam-7.1.1 - blcr works fine though for
local (non-MPI) applications .
hofrat
|