LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2005-06-28 05:54:40


On Jun 27, 2005, at 7:48 AM, jerome lefevre wrote:

> Hi LAM Communauty,
>  
> I have some trouble with PBS, MAUI and LAM. I would like to manage job
> with PBS, but i have misery with TM boot.

After you got Maui running properly, I see that you ran lamboot and it
executed properly (i.e., it used TM, the PBS internal interface). It
only found one node, which seems to indicate that you ran a PBS job
that only asked for one node. LAM directly downloads the list of nodes
from PBS, and PBS will only report the nodes that were reserved for
that job.

Did you have a job that asked for more nodes and only 1 showed up?
Specifically, what was the qsub command that you used to execute the
job, and what was your job script?

> Note, if i ran job with traditional sequence "lamboot", "mpirun",
> "lamhalt", i have success and all nodes compute.
> However, sometimes lamds still remaining on my nodes (i suppose). i.e,
> when i restart a job with the same exe but with newer array, job
> failed, just like older array seems to be not properly cleared from
> memory.

I don't quite understand:

1. You said that lamds sometimes remain; what does "(i suppose)" mean?
Did you run "ps" on those nodes and see that the lamds are still
running after a PBS job has finished and it is no longer visible in the
PBS job queue? Are your LAM/MPI applications still running on those
nodes after the PBS job finished?

2. What exactly do you mean by "array" -- do you mean a new qsub that
you're assuming should be running on a different set of nodes? Or do
you mean a whole separate cluster? Or ...?

3. What exactly do you mean by "job failed"?

For each of these, can you post some specific output showing the exact
problems that you are seeing? Also please see the procedure for
reporting bugs with LAM at http://www.lam-mpi.org/using/support/.

Thanks.

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/