LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Xuehua Chen (cxh_at_[hidden])
Date: 2003-04-04 15:59:18


Hi,

Part of my work is evaluating MPI implemenations. I found LAM is very
efficient in some ways compared to other MPI implementations.So I spent
more efforts on it. Here I want to report something that I found.

I read the source code of fast_send and fast_recv in usysv transport
layer. And I think maybe there is a bug in the implementation of
send and receive operation. I am not very quite sure about it so I hope
I can discuss with someone who know the details of implementation.

In fast_send, the source node will spinlock on the lock of the outbox
of the destination node. If the outbox of the dst (lock=0) is empty,
then it will set the lock to 1 and copy a short message. This seems very
efficient in reducing the latency of short messages. But I think
something wrong could happen with the current implementation as the
whole procedure is not atomic. Consider the following situation:
process 1 and process 2 trying to send to process 0 a message at the
same time and the outbox of process 0 is empty. Process 1 comes earlier
and found outbox is empty, then it go to the next step to set the lock
to 1. If process 2 comes before process 1 set the lock, the process will
also go through and try to set the lock. Under this situation, process
1 and 2 both use the outbox of process 0 at the same time. Process 2
could overwrite what process 1 had written and cause unexpected results.
Due to this, I think we need to use a lock mechanism to make the whole
procedure (spinlock to check the outbox is unlocked and if so, set
the lock) an atomic operation. As I just just read parts of the code
of LAM, I am not sure whether I am right or not. I will be happy if
someone can tell me more about it and point out my mistake if there is.

Another thing i would like to mention is about the process shared mutex.
On redhat 7.3, the rpm version of LAM-6.5.9 with usysv transport layer
use semaphores to lock the shared memory pool, which I think is not as
efficient as process shared mutex. Some pingpong tests has been done and
results showed that when process shared mutex is used, a higher peak in
bandwidth can be achieved.

If we compile LAM source code as follows:
  
configure --with-rpi=usysv --with-pthread-lock ...

I found actually process shared mutex is not used and the check the
results of configure I found the following,
-------------------------------------------------
checking for process shared pthread mutex... no
ignoring --with-pthread-lock
-------------------------------------------------
But currently linux (I know Redhat 7.3 and Redhat 8.0 have it)do have
process shared mutex implemented in glibc. Though using man -k pthread,
we can not found the function. It do exist in pthread.h and can
be used. Results also shown it's more efficient than semaphores.
pthread.h beblongs to glibc-devel-2.2.5-34 on Redhat 7.3. We would
expect any version after 2.2.5-34 will have process shared mutex
available.

Also, I would like to say that based on some tests, LAM is really an
efficient and reliable MPI implementation.

Xuehua Chen

Graduate Student in Computer Engineering
Iowa State University