On Jul 12, 2006, at 2:52 AM, Fco. Javier Ridruejo wrote:
> I am testing the NPB benchamarks on LAM in a fully simulated
> environment.
> Both computing nodes and network are simulated, using Simics for
> nodes and a
> custom interconnection network simulator for network.
>
> We stress the network to test congestion issues on it. But there
> are some
> faulty results maybe due to timeouts on LAM. We are using the -ssi
> rpi lamd
> option to do every communication over UDP, because the TCP congestion
> control fakes our tests.
>
> Our network simulator does not lose any packet, but I think LAM
> timeouts are
> ocurring. Maybe LAM is droping any packet due to full buffers? How
> can I
> manipulate LAM timeouts not to allow to occur?
>
> I have changed LAM_TO_DLO_ACK from 500000 to 50000000, but I think the
> application now lasts much more. I have changed TO_DLO_ESTIMATE
> from 200 to
> 2000 and DOMAXPENDING from 3 to 30, without any success. What is
> the precise
> meaning of these variables?
>
> Now the application lasts more, it could be losing packets and last
> more due
> to the increased LAM_TO_DLO_ACK? To stress the network we have made
> it very
> slow, but injected packets never get lost, they are buffered in
> queues.
So there's a general assumption that packets aren't arbitrarily
delayed -- they're either delivered in a reasonable amount of time
(less than LAM_TO_DLO_ACK) or never delivered. If LAM_TO_DLO_ACK is
regularly missed on packet delivery, we're not going to do very well.
What are you seeing that's causing problems? Without some
information as to exactly what you are seeing LAM do, I really can't
offer much useful advice.
Brian
--
Brian Barrett
LAM/MPI developer and all around nice guy
Have a LAM/MPI day: http://www.lam-mpi.org/
|