Hi Roberto,
What your triangle network topology looks like is a switchless
FNN (Flat Neighborhood Network), at least in our research's terminology.
(See http://aggregate.org/FNN/) There is a patch to LAM to make it work
with FNNs, it is in the 6.6b1 beta, as well as any recent CVS versions.
In those versions of LAM, there is this option to lamboot that will help
make things work for your multiple NIC per node setup:
-l Use local hostname resolution (vs. centralized name lookup)
You will need to assign unique IP addresses for each NIC, and set up the
/etc/hosts and /etc/nsswitch.conf files on each node as follows:
NIC info for Node A:
eth0 NETWORK=10.0.1.0 IPADDR=10.0.1.1 (connected to node B's eth0)
eth1 NETWORK=10.0.2.0 IPADDR=10.0.2.1 (connected to node C's eth0)
NIC info for node B:
eth0 NETWORK=10.0.1.0 IPADDR=10.0.1.2 (connected to node A's eth0)
eth1 NETWORK=10.0.3.0 IPADDR=10.0.3.2 (connected to node C's eth1)
NIC info for node C:
eth0 NETWORK=10.0.2.0 IPADDR=10.0.2.3 (connected to node A's eth1)
eth1 NETWORK=10.0.3.0 IPADDR=10.0.3.3 (connected to node B's eth1)
With each NETMASK as 255.255.255.0, thus yeilding 3 subnets, one
for each edge of your triangle.
For /etc/hosts you need to give each node a personalized list of IP
addresses for its neighbors in the cluster like this:
/etc/hosts for node A:
10.0.1.1 nodeA (this is mostly a placeholder so nodeA knows itself)
10.0.1.2 nodeB
10.0.2.3 nodeC
/etc/hosts for node B:
10.0.1.1 nodeA
10.0.1.2 nodeB (this is mostly a placeholder so nodeB knows itself)
10.0.3.3 nodeC
/etc/hosts for node C:
10.0.2.1 nodeA
10.0.3.2 nodeB
10.0.2.3 nodeC (this is mostly a placeholder so nodeC knows itself)
For /etc/nsswitch.conf, you need to make sure that the "hosts" line
has "files" as the first choice for name resolution. For example:
hosts: files dns
Now, when you do a lamboot, use the -l option and give it a lamhosts
file containing:
nodeA
nodeB
nodeC
Then use mpirun as normal. Each node will talk to it's neighbors thru
the appropriate NIC. Yeah, it's a lot of work to do by hand to set
this up. I'm still working on a simpler approach for FNN use and setup.
I hope that was detailed enough to get things working for you. :-)
Oh, one more thing, you may want to turn on the ARP filter on each node
with these additional lines in the /etc/sysctl.conf file on each node:
#turn on ARP filters
net.ipv4.conf.all.arp_filter = 1
net.ipv4.conf.default.arp_filter = 1
Linux has a nasty default setting that can mess up the proper behaviour
of the ARP protocol in more "interesting" networks such as this.
In your specific case this APR filter isn't necissary, but for general
FNNs this is a must to get any acceptable performance.
Enjoy!
--
Tim Mattox - tmattox_at_[hidden] - http://home.earthlink.net/~timattox
http://aggregate.org/KAOS/ - http://advogato.org/person/tmattox/
On Mon, 21 Apr 2003, R.C.Pasianot wrote:
> Hello there ,
>
> Seem to recollect this has been asked before, would someone please
> either point me to the right place (sorry, searching using our internet
> connection is a real pain) or give me a quick answer ?.
>
> Here's the scenario. I have 3 hosts, A,B, and C, each one furnished
> with 2 NICS. So I connect them as if on the vertices of a triangle
> (don't have a spare switch ):
>
> A
> / \
> / \
> / \
> B-------C
>
> Now I "lamboot" say from A and want all the hosts to communicate among
> themselves using the shortest paths, namely, the sides AB, BC and CA.
>
> Is it possible to do this ?. How would a lamhosts file look like ?.
>
> Thanks a lot. Regards,
>
> Roberto
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
|