On Mar 25, 2008, at 2:32 PM, Luttinger, Matthew wrote:
> When I build on Red Hat 4.3 using LAM 7.0.6, my processes use very
> little CPU when sitting idle at MPI_Recv.
>
> When I build on my target hardware Red Hat 4.6 LAM 7.1.2 my
> processes use 100% of the CPU just sitting and waiting for a message
> at MPI_Recv.
>
> To make it stranger, if I take my processes built on Red Hat 4.3 LAM
> 7.0.6 and run them on red hat 4.6 LAM 7.1.2 they do not use 100% of
> the CPU, they behave as I expect, it is only when I build it on the
> Red Hat 4.6 LAM 7.1.2 that they use 100% of the CPU.
>
> Any ideas ?
>
LAM/MPI has a number of different transport engines it can use under
the covers -- tcp, sysv (blocking shared memory + tcp), usysv (polling
shared memory + tcp), gm (Myrinet/GM). If the usysv rpi is in use,
blocking sends / receives result in hard polling and 100% CPU
utilization. If the sysv rpi is in use, the process can block instead
of polling if there is only communication entirely on node or entirely
off node. If there is a mix (including an ANY_SOURCE receive), LAM
must poll between the TCP and shared memory channels. This includes
the case where the application calls Irecv off node then Recv on node
(or vice-versa, and same with Sends). The tcp transport should never
poll and will only use CPU when communication is actively taking
place. I believe the GM transport will end up polling when there is
active communication.
If your nodes have different configuration (on one platform you were
running one process per node and another you were running multiple
processes per node) or different devices supported, this could account
for the different behaviors you are seeing.
Hope this helps,
Brian
--
Brian Barrett
LAM/MPI Developer
Make today a LAM/MPI day!
|