LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: thanhtn (xinmothomthu_at_[hidden])
Date: 2005-09-16 03:02:33


Hi all,

I am using LAM(v7.0.6) + BLCR(v0.3.1) + PBS(v2.3.16).
I think i installed them correctly because :
 - I could submit job successfully.
 - I checkpointed and restarted successfully with mpi
program that run by mpirun command.
But when i submit a mpi job (my script below), I can
checkpoint mpirun process, It generate a context file
(and each mpi process has a context file). But I can't
restart. ???

- Here are myssript:
#!/bin/sh
#PBS -l walltime=10:00:00
#PBS -l mem=400mb
#PBS -l ncpus=2
#PBS -j oe
 
lamboot
mpirun N -ssi rpi crtcp -ssi cr blcr ./hello
lamhalt

- I submit job with command:
qsub myscript
- And checkpoint with command:
cr_checkpoint <PID of mpirun>
- Restart command:
cr_restart <context file>
- although i had lamboot, i still get error:
-----------------------------------------------------------------------------
It seems that there is no lamd running on the host
may15.
 
This indicates that the LAM/MPI runtime environment is
not operating.
The LAM/MPI runtime environment is necessary for the
"mpirun" command.
 
Please run the "lamboot" command the start the LAM/MPI
runtime
environment. See the LAM/MPI documentation for how to
invoke
"lamboot" across multiple machines.
-----------------------------------------------------------------------------

Anyone can help me?

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com