On Thu, 26 May 2005, Christian Simon wrote:
> Hi everyone.
>
> I have a cluster with G5 xserve cluster nodes. System is MacOSX server
> 10.3.9,
> the switch is a GigE Asante (G52400W); lam-mpi version 7.1.1 was compiled
> with xlf/xlc.
>
> Everything works fine with the default network settings for the nodes.
>
> As soon as I try larger MTU than 1500, the lam-mpi environment fails to
> start:
> lamboot hangs forever.
The Asante GX5-2400W does not support jumbo frames. You could try frame
sizes above 1500 (in say, increments of 500) to find a maximum, but it's
probably very close to 1500.
My personal experience with MPI over ethernet is that jumbo frames are
usually not more important than switch performance. If you test transfer
rates on an idle machine, you find that the speed is about the same
whether you use an MTU of 1500 or an MTU of 9000. But, the transfer
itself can easily use 30% of the processor's time (this is for a 2.0GHz
Opteron, and probably G5 as well). If the sends are non-blocking, then
the available processing power is down 30% during the communication time.
Jumbo frames should chop it to 15% on average. So when doing CPU
intensive calcs which are also communications intensive, the jumbo frames
"may" speed things up a few percent.
You should look at trunking. It probably won't save CPU usage, but it
should halve transmittion times.
> I guess -and I hope- it is not, strictly speaking, a lam-mpi problem.
> Did I miss something in the FAQ ? Any suggestion ?
> --
> Christian
>
>
>
------------------------------------------------------------
Anthony Ciani (aciani1_at_[hidden])
Computational Condensed Matter Physics
Department of Physics, University of Illinois, Chicago
http://ciani.phy.uic.edu/~tony
------------------------------------------------------------
|