LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Qikai Li (qikai.li_at_[hidden])
Date: 2004-10-14 15:22:04


I've also tried with the option "-ssi rpi tcp", the same problem exists
under LAM 7.1.

Normally, we run mpirun like the following:
mpirun -ssi rpi gm -np 8 /crunch/qkli/bin/test

Since our cluster is of dual-cpu node, if you run the test with -np 2
(in a single node), then no memory leak occurs. Starting from -np 4
(using two nodes), memory leak occurs.

The following is a snapshot of laminfo:

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

qkli_at_apollo01:~> laminfo
             LAM/MPI: 7.1
              Prefix: /usr/rels/lam
        Architecture: x86_64-unknown-linux-gnu
       Configured by: root
       Configured on: Sun Sep 19 19:49:57 EDT 2004
      Configure host: apollo01
      Memory manager: ptmalloc2
          C bindings: yes
        C++ bindings: yes
    Fortran bindings: yes
          C compiler: gcc
        C++ compiler: g++
    Fortran compiler: pgf90
     Fortran symbols: underscore
         C profiling: yes
       C++ profiling: yes
   Fortran profiling: yes
      C++ exceptions: no
      Thread support: yes
       ROMIO support: yes
        IMPI support: no
       Debug support: no
        Purify clean: no
            SSI boot: globus (API v1.1, Module v0.6)
            SSI boot: rsh (API v1.1, Module v1.1)
            SSI boot: slurm (API v1.1, Module v1.0)
            SSI boot: tm (API v1.1, Module v1.1)
            SSI coll: lam_basic (API v1.1, Module v7.1)
            SSI coll: shmem (API v1.1, Module v1.0)
            SSI coll: smp (API v1.1, Module v1.2)
             SSI rpi: crtcp (API v1.1, Module v1.1)
             SSI rpi: gm (API v1.1, Module v1.2)
             SSI rpi: lamd (API v1.0, Module v7.1)
             SSI rpi: sysv (API v1.0, Module v7.1)
             SSI rpi: tcp (API v1.0, Module v7.1)
             SSI rpi: usysv (API v1.0, Module v7.1)
              SSI cr: self (API v1.0, Module v1.0)
qkli_at_apollo01:~>

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

I've also noticed a strange behavior under Suse Linux 9.1 (64-bit), for
example, if you use malloc and free in a subroutine, and call the
subroutine repeatedly, then the memory seems to be NOT freed properly.
This only happens under 64-bit situation.

Best regards,

Qikai
On Thu, 2004-10-14 at 14:20, Brian Barrett wrote:
> On Oct 14, 2004, at 9:37 AM, Qikai Li wrote:
>
> > The same code runs perfectly under LAM 7.0.6 with a stable memory
> > usage.
> >
> > Several guys in our group have experienced the same problem when I
> > switched the LAM from 7.0.6 to 7.1.
> >
> > Maybe this is related to a possible bug in gcc, i.e., the memory is NOT
> > properly freed under 64-bit environment even though you have used, for
> > example, pairs of malloc (or calloc) and free.
> >
> > Also, The problem seems to be only 64-bit specific.
> >
> > Or maybe it's the problem of LAM 7.1.
>
> Thanks for the bug report. We only have access to one Opteron machine
> and it doesn't have Myrinet, so I was wondering if you could run a
> couple tests for me to help localize the problem. First, could you
> send me the output from the "laminfo" command? There are a number of
> places that changed between 7.0 and 7.1, so I'm hoping we can localize
> it to a particular component. Does the memory leak happen regardless of
> number of processes running?
>
> Also, could you see if it happens with the following SSI options:
>
> -ssi rpi tcp (use tcp instead of gm)
> -ssi coll lam_basic (use the really simple collectives code)
>
> You specify the ssi params during mpirun, so something like: "mpirun
> -np 4 -ssi rpi tcp ./a.out"
>
> I'm looking at the problem as well, but having some starting points
> would really help.
>
> Thanks!
>
> Btian

-- 
Qikai Li
School of Materials Science and Engineering
Georgia Institute of Technology
771 Ferst Drive N.W.
Atlanta, Georgia  30332
Email: qikai.li_at_[hidden]
Phone: 404-385-2852