On Sun, 2 Dec 2001, Cesar Delgado wrote:
> I installed a MOSIX cluster (5 nodes) and am wondering how I go about
> setting up LAM on it. I read the documentation on MOSIX and they talk
> about their "fork and forget" business, so I installed LAM from the
> RPM in RH7.2 and added 5 instances of localhost on to the
> lam-bhost.def hopping that that would create 5 separate processes and
> MOSIX would take care of the rest. As it turns out, all the processes
> stay on the node I called them up in and non get load balances (or
> forked out I guess would be a better way of putting it). Do I have to
> setup LAM just like I would if I didn't have MOSIX (setting up rsh and
> NFS) to get it to run on all nodes?
Yes and no.
Yes: You'll need to have LAM installed on your MOSIX cluster like any
other cluster.
No: MOSIX won't play nicely with LAM or any MPI implementation that they
haven't specifically tailored for MOSIX (I don't think that they have
tailored an MPI for MOSIX). The reason is that LAM uses sockets and/or
shared memory to communicate between MPI processes. If MOSIX moves an MPI
process to another machine, LAM suddenly has no idea where that new
process is, and (to make a long story short) won't be able to communicate
with it.
I don't know if MOSIX will automatically re-map shared memory if a process
migrates to a new machine, but I do know that sockets will not follow
processes to a new machine. Hence, if MOSIX migrates an MPI process, its
peer MPI processes will suddenly have stale socket file descriptors, and
future communication will be impossible.
I see notes on the MOSIX web site about development of "migrating
sockets", but there is no indication of how far along that project is;
you may want to ask them about it. Even so, if/when MOSIX supports the
migration of sockets, it's only a stopgap measure.
Consider: MPI processes A and B are on different machines. B suddenly
gets migrated to a new machine (not the same as A). Even if MOSIX is able
to maintain the sockets between processes A and B, there's likely to be at
least one additional hop for every MPI message, which could cause
communication-bound applications to really slow down to a crawl. (Note:
it is possible to migrate sockets without adding additional hops, but that
becomes *really* complicated because of some really sitcky race conditions
-- I don't know what the MOSIX crew is planning to do). Indeed, in a
worst case scenario, if migrating sockets keeps adding another hop every
time a process migrates, the latency on each MPI message will grow
whenever A or B is migrated.
There's other issues as well (e.g., the MPI implementation caching peer
MPI process IP addresses, requiring new communication devices depending on
the destination of the migration, etc.), but this is one of the big ones.
The real solution is to integrate the MPI implementation with the
migration system. Simply put: the MPI implementation must be involved in
the migration; it cannot be passive. I don't know if the MOSIX system
supports such hooks/callbacks to notify processes (and/or their parents)
that they will be migrated.
So for now, all your MPI jobs will have to be run without MOSIX migrating
them.
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/
|