LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Fu HongYi (fool_20022004_at_[hidden])
Date: 2007-03-13 10:19:21


hi everyone. i am working with blcr/lammpi, but something goes wrong and i don't know what's the reason. first i installed blcr and lammpi properly. hence i tested c/r on a mpi program running on a single node. everything went smoothly. the program ran, checkpoint commands were executed successfully, context files were generated, and restart process as well ran properly. later i tried the same experiment on a 2-node cluster, in which i got failed. i started the mpi program with command: mpirun -np 2 -ssi rpi crtcp -ssi cr blcr C ./lamtest while the program was running, i did checkpoints using command: lamcheckpoint -ssi cr blcr -pid 10411 (*10411 is the pid of mpirun.) thus the command stopped there and never returned until ctrl-c. i checked the working directory, i. e., my home directory, and no context file was found. however, some temporary files named as .context-xxxxx-xx.tmp presented. so someone please tell me what's the problem and i will be much appreciated. thanks . ___________________________________________________________ ÇÀ×¢ÑÅ»¢Ãâ·ÑÓÊÏä-3.5GÈÝÁ¿£¬20M¸½¼þ£¡ http://cn.mail.yahoo.com