Hi all:
I have configured lam7.1.6(parameter: --with-rpi=crtcp --with-cr-blcr=my
blcr directory) and blcr0.3.1 successfully.It tell me ok in "lamnodes"
information.Then I run the command "mpirun C -ssi cr blcr
/blcr/hello-loop",it show me the wrong information
The "blcr" module requested in the CR kind was not found.
this typically means that you misspelled the desired module name,or used the
wrong name enirely.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It seems tha [at least] one of the process that was started with
mpirun did not invoke MPI_INIT before quitting (it is possible that
more than one process did not invoke MPI_INIT -- mpirun was only
notified of the first one ,which was on node n-1073747080).
mpirun can "only" be used with MPI programs(i.e.,programs that
invoke MPI_INIT and MPI_FINALIZE).You can use the "lamexec" program
to run non-MPI programs over the lambooted nodes.
So I change the command "mpirun C -ssi CR blcr /blcr/hello-loop",it runs...
what's the difference between parameter "cr" and "CR"?, thats's my first
question.(I remember it use "cr" in the lam7.1.2 user.pdf)
Next, I do a checkpoint file with command "lamcheckpoint -ssi cr blcr -pid
my hello-loop'pid",
it create the checkpoint file "context.mpirun.16028" successfully. Now I use
"lamrestart -ssi cr blcr -ssi cr_blcr_context_file context.mpirun.16028", it
always shows "mpirun (rpwait) : Bad file descriptor",I want to know why?
This is my second qusetion...
i hope you can help me to fix this problem...Thanks, regards
|