There is a solution for exactly this problem at the CVS HEAD (other PBS
users asked for this a few months ago :-) -- a new [as yet undocumented]
feature called a "hostmap".
Here's a snipit from the bug (535) when I added the functionality:
-----
Host mapping functionality can be applied to all non-out-of-band
communication (i.e., MPI communication -- not native LAM/nsend-based
communication). This nicely fits the "slow/admin" and "fast/parallel"
network model.
There is now a new SSI parameter "mpi_hostmap" (its prefix of "mpi" is
meant to imply that it applies to all MPI SSI modules). It defaults to
$sysconfdir/etc/lam-hostmap.txt (you can see this in "laminfo -param all
all").
There is also now a new default (empty) lam-hostmap.txt file that gets
installed into $sysconfdir if there isn't one there already (similar to
lam-bhost.def) -- so we won't overwrite people's hostmaps if they've
already installed/customized one. It gives a short explanation of the
simple format of the hostmap file.
All rpi modules have been converted to use this hostmap functionality
(except lamd, of course). Since no coll modules [yet] implement their
own progress engines, no conversion was necessary.
-----
The file trillium/etc/lam-hostmap.txt is the empty/sample hostmap that the
text above refers to. It provides an explanation of the file format, etc.
Let me know if this works out for you.
On Wed, 25 Feb 2004, Bogdan Costescu wrote:
>
> Hi!
>
> I'm trying to come up with a method of mangling hostnames when using
> the "tm" boot module. By mangling hostnames I mean using nodes that
> have multiple IP-based network interfaces when the node is known by
> one name assigned to the IP of one interface, but for the MPI
> communication another IP (and another interface) should be used.
>
> This is pretty simple to do with the "rsh" boot module, as this takes
> a boot schema where the node names/IPs can be specified and a simple
> naming convention like "nodeXXX" corresponds to "gigeXXX" or
> 192.168.1.xxx corresponds to 192.168.2.xxx (for "normal" and MPI
> interfaces respectively) requires only a very short
> sed/awk/perl/python/etc. script to translate.
>
> However, the "tm" module uses by design the hostnames given by *PBS,
> without any means of making a translation. This works fine for non-IP
> networks (like GM/Myrinet) which make their own mapping between the
> hostname as given by *PBS and their network addresses. How do other
> people solve this problem ?
> If there is no solution, is this something that would be interesting
> for more users ? How would you like to do the host mangling in this
> case ?
>
>
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|