I have tested the same applications with several values of LAM_TO_DLO_ACK.
For 500000us (default) the application lasts about 34 seconds (of simulated
time). For 5000000us I have run it a lot of time and at last I stopped it
because it doesn't seem to finish (much more than 34 seconds). With 50000us
the application lasts 12 seconds and with LAM_TO_DLO_ACK of 5000us the
application lasts 10 seconds.
All tests are being done with the same network and computing node
parameters.
It seems clear that something is going wrong with the packet delivery within
LAM. As the simulated network is something kind of special, it could be
overfilling the buffers of LAM daemons? Everything is done with -ssi rpi
lamd, and the tests are done to saturate the network as that is the
phenomenon we want to study.
Is there any way to deactivate the timeout checking not to resend the packet
and to increase the lamd daemons buffer capacity to dodge this problem?
Thank you very much,
----------------------------------------
Francisco Javier Ridruejo Pérez
Red Académica i2BASK (UPV/EHU)
Parque Tecnológico de San Sebastián
Pº Mikeletegi, 69 - Torre Arbide Norte
20009 Donostia - San Sebastián
Tel.: +34 943 018 705
Fax.: +34 943 015 590
E-mail: franciscojavier.ridruejo_at_[hidden]
----------------------------------------
-----Mensaje original-----
De: Brian Barrett [mailto:brbarret_at_[hidden]]
Enviado el: domingo, 16 de julio de 2006 23:02
Para: General LAM/MPI mailing list
Asunto: Re: LAM: Timeouts
On Jul 12, 2006, at 2:52 AM, Fco. Javier Ridruejo wrote:
> I am testing the NPB benchamarks on LAM in a fully simulated
> environment.
> Both computing nodes and network are simulated, using Simics for
> nodes and a
> custom interconnection network simulator for network.
>
> We stress the network to test congestion issues on it. But there
> are some
> faulty results maybe due to timeouts on LAM. We are using the -ssi
> rpi lamd
> option to do every communication over UDP, because the TCP congestion
> control fakes our tests.
>
> Our network simulator does not lose any packet, but I think LAM
> timeouts are
> ocurring. Maybe LAM is droping any packet due to full buffers? How
> can I
> manipulate LAM timeouts not to allow to occur?
>
> I have changed LAM_TO_DLO_ACK from 500000 to 50000000, but I think the
> application now lasts much more. I have changed TO_DLO_ESTIMATE
> from 200 to
> 2000 and DOMAXPENDING from 3 to 30, without any success. What is
> the precise
> meaning of these variables?
>
> Now the application lasts more, it could be losing packets and last
> more due
> to the increased LAM_TO_DLO_ACK? To stress the network we have made
> it very
> slow, but injected packets never get lost, they are buffered in
> queues.
So there's a general assumption that packets aren't arbitrarily
delayed -- they're either delivered in a reasonable amount of time
(less than LAM_TO_DLO_ACK) or never delivered. If LAM_TO_DLO_ACK is
regularly missed on packet delivery, we're not going to do very well.
What are you seeing that's causing problems? Without some
information as to exactly what you are seeing LAM do, I really can't
offer much useful advice.
Brian
--
Brian Barrett
LAM/MPI developer and all around nice guy
Have a LAM/MPI day: http://www.lam-mpi.org/
|