LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2004-02-04 09:25:40


On Wed, 4 Feb 2004, P. W. Leung wrote:

> Recently I encountered a problem using LAM 7 together with OpenMosix. To
> make it simple, I install RH9 on a P4 PC and patch it with OpenMosix
> 2.4.22-2 rpm. Since there is only one node, no process will be migrated
> by OpenMosix. I then run an MPI program compiled with LAM 7.0.4. This
> program uses one process only and therefore no MPI communication occurs.
> I find that as the program runs, it suddenly take up more and more
> memory until it exceeds 2GB and die. This is reproducible and when I run
> it in gdb, I get the error messages as attached. It seems to point to
> the pthread library. Note that this problem occurs only when I use LAM 7
> and OpenMosix together. In particular, there is no problem under the
> following situations:
>
> 1. when I turn off OpenMosix by "/etc/rc.d/init.d/openmosix stop" and
> run an MPI program compiled with lam 7.0.4.
> 2. when I have openmosix on but run a serial (non-MPI) program
> 3. when openmosix is on and run an MPI program compiled with lam 6.5.9
>
> Finally, I find that the same problem occurs if I run HPL from netlib.
> So this is not a problem with our program only.

>From your gdb bt, it looks like we're just calling free() on an MPI
request that has finished.

Can you run this program through valgrind to see if there is other memory
badness occuring?

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/