On 2/24/07, Brian Barrett <brbarret_at_[hidden]> wrote:
> On Feb 21, 2007, at 1:44 AM, Ramon Diaz-Uriarte wrote:
>
> > On 2/20/07, Brian Barrett <brbarret_at_[hidden]> wrote:
> >> This is a problem with the LAM daemon and internal resource
> >> limitations. The LAM daemons were not intended to be used to run
> >> that many processes on one node. Unfortunately, this will not be
> >> fixed in LAM as it would require a large number of changes and we are
> >> currently focusing all our development work on Open MPI. The best I
> >> can suggest is to not spawn as many processes per node.
> >
> > Thanks for the reply. At least this will put a stop to my endless
> > search for "what did I screw up".
> >
> > Two questions, though:
> >
> > 1. am I less likely to run into these problems if I switch to Open
> > MPI?
> >
> > 2. Would having many lamd per node (thus, with a lot fewer slaves per
> > demon) help ?
>
> You are less likely to run into this particular problem with Open MPI
> than with LAM. However, if you are using MPI_COMM_SPAWN to start the
> processes, I would warn you that Open MPI's support for spawning is
> not nearly as stable as that found in LAM/MPI.
Ahahaha. Thanks for the caveat.
>
> Running more than one lamd per node is only possible if the daemons
> are in different "universes". This should allow you to run more
> processes per node, but keep in mind that a process in one universe
> can not communicate with a process in another universe.
>
>
Yes, sure. This is very well explained in the documentation and the
entry about LAM_ ..._PREFIX. The different lamd would take care of
completely different (and non communicating) jobs.
I've finally been able to stabilize things. I allow a max of four
(indep) universes per node (each universe with 4 lamd per node), with
a very rudimentary queueing system (that does not open any new
universes till one of the other four closes). Seems to hold up to even
unrealisticly high number of simultaneous users.
Thanks a lot again.
Best,
R.
> Brian
>
> --
> Brian Barrett
> LAM/MPI developer and all around nice guy
> Have a LAM/MPI day: http://www.lam-mpi.org/
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
--
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz
|