LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2005-07-19 13:35:45


On Jul 19, 2005, at 1:36 PM, Eric Adint wrote:

> I am using LAM 7.0.1 on a 53 Xserve cluster, we have multiple users
> running lam jobs. according to the documentation lam is supose to be
> able to accommodatethis, but when we have multiple users running jubs
> one of the userers cannot run mpi tasks, i am using mpirun -np 32
> <executable> with lamboot and a standard host file, is there a
> speacial command that i need to use so that the different lam
> procceses do not interfefre with each other.

Each unix user's LAM universe should not interfere with each other.
For example, if user jdoe runs "lamboot" with a standard hostfile,
there will be a LAM universe for jdoe. If user bsmith then runs
lamboot with the same hostfile, there will be another LAM universe on
the same nodes, but for bsmith.

Both users can then mpirun their jobs, and they won't interfere with
each other (from a LAM perspective).

That being said, if you're running multiple parallel jobs on the same
node, it is highly likely that they are going to compete with each
other for CPU cycles, RAM, network resources, etc. So, for example, if
you have a Myrinet network and have multiple users all running MPI jobs
on a single node, it is quite possible that you'll run out of "special"
memory for Myrinet and things will go badly.

A common model for clusters is actually to avoid sharing CPUs -- use a
batch scheduler of some kind such that MPI apps get full usage of the
CPUs and/or nodes that they have been assigned. This avoids the
problems discussed above.

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/