On Thu, 19 Jun 2003, Andrey Slepuhin wrote:
> I read their articles, but it seems that they solve another problem:
> having multiple interfaces and multiple switches how to route packet
> depending on destination address. But I want to do the following: having
> only one switch and two network interfaces on each node I want to attach
> each of two MPI processes running on a node to separate network
> interface to avoid collisions while keeping shmem communicztion between
> processes on a same node.
Gotcha.
Have you considered channel bonding? I don't know what the current
state-of-the-art is with regards to channel bonding. I've never tried it
myself -- I've heard both success and failure stories about it. There's
two factors here -- latency and bandwidth.
The reason that I ask is because regardless of the route you take (pardon
the pun), LAM is still single threaded. Hence, the OS will be the one
that makes progress on the underlying write()'s and read()'s. So whether
they're going across two different TCP sockets or you have them channel
bonded, the OS is responsible for making progress across those two
sockets. My only point here is that if you can do a quick-n-dirty channel
bonding setup and get that working, it may be easier than modifying LAM.
Modifying LAM is certainly possible, there's multiple factors involved:
- if every MPI process potentially has two IP addresses, you'll need to
decide on which one goes to which process, or have every process listening
on both. This may seem trivial, but it's a fair amount of logistical work
(before you even get to the interesting stuff). For example, if you have
2 sends pending to the *same* process, do they use different sockets
(which would be hard), or do they use the same socket, and you have some
kind of ordering such that you spread the use of the two NICs on a
per-process basis, not a per-message basis (which means you mainly have to
come up with a distribution scheme that will actually provide some benefit
to your applications).
- LAM currently gets the IP address of each MPI process from the lamd
routing table. You'd have to modify this scheme (although it probably
wouldn't be too hard) to have each MPI process send its IP address around
to each of its non-local peers.
So actually thinking about this a little more, and thinking about how the
TCP components of the RPI setup and work, if you simply load balance per
MPI process across the NICs, perhaps this wouldn't be *too* difficult to
do... (but then again, keep in mind my bais as LAM's main RPI expert ;-).
I think if you modify the initialization-time stuff, the rest of the
progress engine will work exactly the same. But if you want to multiplex
across the sockets, it'll be a bit harder.
But it's still worth a try with channel bonding to see if that gives you
want you want, too.
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|