LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: David Losada (david.losada_at_[hidden])
Date: 2006-04-27 17:14:26


Hello,

I'm programming in a multi-processor and have an issue with the SYSV
rpi. I have searched and browsed the list archives without finding a
remedy. Maybe someone can help me? My issue is the following:

In my MPI program, one process is devoted to serving requests from other
processes. I'm using MPI_Recv calls for this server process, so it stays
blocked waiting for the requests. Because of this blocking behavior, I
was expecting the process to use little CPU.

However, when I run my program, exactly the contrary happens: the
process uses 100% of one of the CPUs in the system. The 'top' utility
reveals approx 25% of this time is spent in user code and 75%in system code.

Strac'ing the process, displays the following system calls being performed:

semop(42467330, 0x2a9599a898, 182898352496) = -1 EAGAIN (Resource
temporarily unavailable)
semop(42500099, 0x2a9599a898, 182898352496) = -1 EAGAIN (Resource
temporarily unavailable)
sched_yield() = 0
semop(42467330, 0x2a9599a898, 182898352496) = -1 EAGAIN (Resource
temporarily unavailable)
semop(42500099, 0x2a9599a898, 182898352496) = -1 EAGAIN (Resource
temporarily unavailable)
sched_yield() = 0
semop(42467330, 0x2a9599a898, 182898352496) = -1 EAGAIN (Resource
temporarily unavailable)
semop(42500099, 0x2a9599a898, 182898352496) = -1 EAGAIN (Resource
temporarily unavailable)
sched_yield() = 0
semop(42467330, 0x2a9599a898, 182898352496) = -1 EAGAIN (Resource
temporarily unavailable)
semop(42500099, 0x2a9599a898, 182898352496) = -1 EAGAIN (Resource
temporarily unavailable)

etcetera... this is from an execution with 3 processes (1 server, 2
clients).

Now, I understand the spinning behavior comes from LAM making
non-blocking semaphore operations. Now I see that LAM uses a different
semaphore set for the communication between each pair of processes. So,
since I programmed my server to be equally available to receive messages
from two different clients, LAM can't block in neither semaphore set.

Anyway.. nothing in the manual pages made me expect such spinning
behavior... and I can see this to be quite a common pitfall. Maybe
there's a workaround I don't know? Maybe I can change something in my
MPI calls? Does anyone have a suggestion (apart from using the TCP rpi)
that will save me this waste of CPU time?

kind regards,

David Losada