On Jun 22, 2005, at 3:48 PM, Ross Heikes wrote:
>>
>> sched_yield is part of the operating system, as is switch_priority.
>> Both are functions to get the operating system to switch the current
>> process off the CPU so that someone else can run. LAM will call these
>> functions when it is waiting for information from it's shared memory
>> transport.
>>
>> If your application appears to be hung in one of these functions, look
>> farther up the call stack - you should see an MPI function in there
>> somewhere, which is the function call your application made before it
>> hung. That should at least give you a place to look for the next
>> step.
>
> Well , the stack call shows a call to MPI_WAITANY. But this call is
> made many
> times in a program and it
> hangs at different locations in program flow(though at same line in
> the same
> program)
> IS this Lam problem or is it OS(Apple) problem
This sounds like an application deadlock problem, not a LAM or OS
problem. Unless you are using the USYSV rpi (as I mentioned earlier),
deadlocks in MPI_Waitany are usually the result either receives never
being posted for the sends you are waiting on, or sends never being
posted for the receives you are waiting on. I'd start by looking at
the requests you are waiting on - can they be fulfilled based on the
pending communication on other nodes? Totalview has a rich set of
functionality for looking at communication patterns, although I'm not
sure how well it works with non-blocking communication. But if you can
trace through your pending requests, you should be able to figure out
where the deadlock is coming from.
I'd look at your application before looking at LAM/MPI or the OS, as
these problems are traditionally issues with the MPI application
itself. These can be much easier to debug if you are using TCP instead
of a combination of TCP and shared memory, so you might want to try
adding "-ssi rpi tcp" to your mpirun options.
Brian
--
Brian Barrett
LAM/MPI developer and all around nice guy
Have a LAM/MPI day: http://www.lam-mpi.org/
|