Hello,
I have installed LAM 7.1 on dual-NIC computers in a cluster, and I'm
wondering whether I have observed an increase in performance on version
0.6beta of the HPC challenge benchmarks (see
http://icl.cs.utk.edu/hpcc/software/index.html) that I hoped to see owing to
the availability of extra bandwidth. My test cluster has two switches, one
for each network, and each node is connected to both networks. I followed
the helpful advice of Tim Mattox (see
http://www.lam-mpi.org/MailArchives/lam/msg05767.php) in using lamboot -l as
well as setting up the hosts and nsswitch.conf files, but I didn't get
better benchmarks when using hosts files that listed two IP addresses for
each node rather than just one IP address per node. I do, however, notice
an improvement over that of MPICH 1.2.5, which might be owing to the
round-robin socket writing mentioned by Jeff Squyres (see
http://www.lam-mpi.org/MailArchives/lam/msg04604.php); however, Jeff also
states that Open MPI will have "true simultaneous multi-device transport
support" (see http://www.lam-mpi.org/MailArchives/lam/msg08737.php), which
seems to imply that we still don't have full support for multiple networks
in LAM 7.1.
Do I have to rewrite the HPC benchmark code to utilize multiple NICs, or am
I overlooking a necessary step that would allow me to obtain better
bandwidth for off-the-shelf MPI software? And if LAM automatically uses
multiple networks, can I turn this off to see results owing to just one NIC
per node?
Thanks,
Mark Dickson
|