LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Angel Tsankov (fn42551_at_[hidden])
Date: 2005-11-22 01:33:55


> On Nov 21, 2005, at 4:00 PM, Angel Tsankov wrote:
>
>> The multiple use of a persistent communicaion request does not
>> eliminate the startup time every time the request is used to send
>> data, does it?
>
> I'm not sure what you mean by "startup time" -- do you mean
> "overhead"?
>
> Using persistent requests does eliminate some of the overhead
> associated with sending / receiving. For the purposes of this
> conversation, I'm defining "overhead" as the time between when you
> invoke (for example) MPI_Send and when the data actually is sent to
> the
> peer (e.g., when the data "hits the wire").

Yes, this is exactly what I mean by "start up time" - the time it
takes to prepare the message and start sending it. As far as I know
these activities cannot be overlapped (e.g. with computations).

In the meanwhile I performed some experiments - the following piece of
code was executed two times with number_of_communications=10,000 (ten
thousand):

 for( unsigned int i = 0; i < number_of_communications; ++i )
 {
  MPI_Start( &r_request );
  MPI_Start( &s_request );

  MPI_Wait( &r_request );
  MPI_Wait( &s_request );
 }

The first time I used shared memory to transfer 128*128 (=16K)
elements of type double (8B in size each) from each process to the
other one. This was done in 10.7537s.
The second time I used 100BASE-T Ethernet to transfer a single element
of type double from each process to the other one. This was completed
in 1.23418s.
I decided to transfer a single element at each step in the second case
in order to measure the startup time. It can be seen that startup time
(i.e. overhead) is longer than shared memory communication. If one can
completely overlap communications with computations (as is the case
with larger systems when I use Ethernet) then this could be another
possible reason for the solver finishing work in less time when
communication is performed over Ethernet, right?

Some more results in case someone is interested:
Transferring 32*32 (=1K) doubles using shared memory takes 0.557143s.

It seems that if shared memory is used communication time grows
(almost) linearly with amount of data. I must admit, that I believed
this time is fairly constant.

Angel