On Monday, November 10, 2003, at 04:27 AM, jess michelsen wrote:
> I replaced the Intel gigabit driver (e1000) version 4.3.2-k1 NAPI by
> the
> latest version (5.2.20). Also, I replaced the OS kernel
> (linux-2.4.18-14) by linux-2.4.20-20.8. Since then, the MPI
> communication between two nodes seems to be much more stable, i.e. I
> didn't see any 'hangs' until now. However, both latency and bandwidth
> are deteriorated. Latency is now around 250 usec and bandwidth around
> 300 Mbit/sec. (they were 120 usec,600Mbit/sec).
>
> 1) Could the changed OS kernel be responsible (in any part) for this?
>
> 2) Should LAM-MPI be re-installed, i.e. is the intel driver linked into
> the LAM or MPI programs?
>
> 3) Has anybody studied, how the parameters for the e1000 driver (which
> are set when the ethernet devices are activated - the e1000 driver is a
> module, not compiled into the kernel) affect performance. Is there an
> optimal and safe setting? In our case, we will part of the time be
> latency-bound, and the packet sizes are normally below 64Kbyte. So,
> both
> latency and bandwidth need to be as optimal as possible w/o sacrificing
> stability.
I can't add much about (1) and (3) other than to say that the kernel
version change very well could be responsible - it wouldn't be the
first time that a kernel change has caused strange performance changes.
There is no need for LAM/MPI to be reinstalled. LAM does not require a
relink for changes to GigE drivers, as we only interface to the card
through the standard library calls. Myrinet is really the only network
we support right now were changing the driver software might require a
LAM rebuild.
Hope this helps,
Brian
--
Brian Barrett
LAM/MPI developer and all around nice guy
Have a LAM/MPI day: http://www.lam-mpi.org/
|