LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Bogdan Costescu (bogdan.costescu_at_[hidden])
Date: 2004-08-11 11:51:22


On Wed, 11 Aug 2004, C.L. Lai [ALAN] wrote:

> The scheduler tends to take as many nodes as possible

If this is not what you want, try setting allocation_rule to $fill_up
in the PE definition (see 'man sge_pe'), which will allocate as many
slots as possible from a node before going to another node.

> In such case, the number of allocated slots for the job is only 1.
> So did you say the problem comes from here?

Yes.

> I have also tried specifying more required-processors than the number of
> nodes, so that some nodes will have more than 1 slot allocated, but the
> result is the same.

lamd needs to be booted in all nodes, so you have to have _all_ nodes
with more than 1 slot allocated, not only some of them.

> Is it like the startup of lamd is treated as part of the job and
> requires an extra slot for this startup?

Yes.

> So that the required-slots is allocated-slots + 1 ?

Required slots is always 2. One for the qrsh-remote step, one for the
qrsh-local step. After lamd is started by the qrsh-local step, SGE is
not involved anymore, all processes are children of lamd. LAM doesn't
care about the slots allocated by SGE, but due to the fact that the
boot schema is created based on the information from SGE, LAM will
only start on a node as many processes as slots allocated by SGE.
However, because lamd was started by qrsh, SGE still has control over
how much time the whole LAM ensamble (lamd + children) can run and can
also send signals (for example, to achieve qdel on a running job).

-- 
Bogdan Costescu
IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu_at_[hidden]