LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2005-05-27 16:53:02


On May 27, 2005, at 1:15 PM, raregue wrote:

> I'm getting the following screen output after my simulation has not
> converged a few times (I haven't counted how many times it takes).
> lamboot works fine, along with mpirun, and so forth. But when my
> simulation ends a few times due to the solution not converging,
> eventually mpirun will not work. To remedy this, I need to reboot my
> G5, which is something I'd like to avoid. Any ideas?

Yoinks -- that should clearly not happen.

Are your MPI applications crashing, perchance? There's not enough
information here to know which RPI you're using, but I'm guessing it's
one of the shared memory RPIs.

What I suspect is happening here is that your apps are exiting
uncleanly and shared memory is being left allocated in the OS. The
"lamclean" command should go through and release all this allocated
memory, and then mpirun should start working again.

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/