Christopher Porter wrote:
>
> I tried lambooting daemons as one user and running an mpi
> application as another user (in hopes promiscuity would
> allow the daemon to connect to my mpirun process) but that
> fails the same as it does without the --with-boot-promisc enabled.
The solution which we used, which worked phenomenally well
and was also efficient in terms of startup time (if slightly
wasteful of memory) was to prestart lamd's as 16 or 32
different users (lamuser01 -> lamuser32) and then to assign
lamusers in round-robin fashion from the unoccupied pool
to real users. This requires only a very simple management
layer, and is also amenable to dynamic scaling.
In this case the compute nodes were topologically isolated
from all other subnets, and communicated with a dual-homed
master. The master itself was behind a gateway which ran
a simple mediating server which was the interface that user's
saw. All interesting system events were propagated to
the gateway where they could be browsed. For example:
http://ldas-dev.ligo.caltech.edu/ldas_outgoing/logs/LDASmpi.log.html
Which points to a system reserved for developers for the
purpose of debugging.
There is an abandoned wiki entry giving slightly more detail than
this email at:
http://www.mpi-comm-world.org/index.cgi?Low_Latency_Batch_Processing_With_LAM
Phil
|