LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2003-06-26 14:35:08


On Thu, 26 Jun 2003, Michael Madore wrote:

> cat myapp
> h /home/mmadore/master
> C -s h /home/mmadore/slave

Note this line here -- the "-s h" switch means "send the executable from
here (this node)". Hence, the behavior you are seeing below is actually
what LAM should be doing:

> ps ax | grep flatd
> 31890 pts/2 SW 0:00 [lam-flatd1]
> 31891 pts/2 SW 0:00 [lam-flatd1]
> 31892 pts/2 SW 0:00 [lam-flatd1]
> 31893 pts/2 SW 0:00 [lam-flatd1]

The "flatd" is the minidaemon in the lamd that (among other things)
handles receiving executables from other nodes and instantiating them
locally. So the lam-flad1 is actually the slave process that is
instatiated on each node.

The [] indicate that the process is running elsewhere, and I see from your
bpsh output that it looks like it is actually running on each node. So
that looks good so far.

The question, then, is why it doesn't finish. Can you attach to the
master with a debugger and see where it stopped?

> Also, I notice that the head node does not get assigned any work, which
> is probably desired in most cases. However (especially on smaller
> clusters), it is sometimes desirable to use the head node for
> computation also. Is this possible? I tried setting schedule=yes in my
> hosts file, but it seems like the bproc code unconditionally sets the
> NT_WASTE flag.

Oy -- that should not happen. If you assign schedule=yes, that should
override the default. I'll look into this...

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/