LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Tobias Wenzel (wtob_at_[hidden])
Date: 2004-06-23 11:00:07


On Thu, 22 Juni 2004 10:38 Bogdan Costescu wrote:

> ...
> started with the assumption that network congestion can never happen;
> but as soon as the project met real-world with packets being dropped
> or delayed, ...
>
> I don't want to sound like bashing your project. On the contrary and
> as Brian wrote, it's very nice that a group independent of the LAM
> developers created a new RPI. I was actually keeping my eyes on the
> ParMa2 project (http://www.ce.unipr.it/research/parma2/via/via.html)
> until they stopped updating their webpage. I will test your code
> sometime later this summer, after I'll finish setting up our new
> (still TCP/IP/Ethernet) cluster.

You are welcome and absolutely correct - regarding to reliable delivery we
trusted into the information on the M-VIA web page
(http://old-www.nersc.gov/research/FTG/via/).

"6/21/01: M-VIA version 1.2b2 (beta) and associated driver sets released. It
works with Linux 2.2 and Linux 2.4. This is the first version to support
Reliable Delivery over Ethernet NICs."

Also a call to 'VipQueryNic()' assures that the NIC supports reliable delivery
(VIP_SERVICE_RELIABLE_DELIVERY bit is set in the field
'ReliabilityLevelSupport')

I'm not sure but it may not be as it is promised ->
As you can see on our web page, 5 test of LAMs conformance suite failed. All
these tests produce much traffic. The debug output shows that on the RPI
layer everything works as expected. All packets arrive properly but after
this the check for the correctness of the received data fails. There are only
a few bytes that still are as they were initialized in the receive buffer.
We did not found a certain reason for this but for me it still looks like
unreliable delivery.