LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: J G Che (jgche_at_[hidden])
Date: 2006-04-28 21:21:09


Thanks!
I compiled with option about memory, see config.log, and type bt after gdb stoped, the output please see below:

jgche: ~/lam-test\>lamboot -v lamhosts

Segmentation fault

jgche: ~/lam-test\>gdb lamboot

GNU gdb Red Hat Linux (5.2.1-4)

Copyright 2002 Free Software Foundation, Inc.

GDB is free software, covered by the GNU General Public License, and you are

welcome to change it and/or distribute copies of it under certain conditions.

Type "show copying" to see the conditions.

There is absolutely no warranty for GDB. Type "show warranty" for details.

This GDB was configured as "i386-redhat-linux"...

(gdb) bt

No stack.

(gdb) run

Starting program: /people/jgche/lam-7.1.2-debug/bin/lamboot

[New Thread 8192 (LWP 28340)]

 

Program received signal SIGTRAP, Trace/breakpoint trap.

[Switching to Thread 8192 (LWP 28340)]

0x00000000 in ?? ()

(gdb) bt

#0 0x00000000 in ?? ()

(gdb) quit

The program is running. Exit anyway? (y or n) y

jgche: ~/lam-test\>laminfo

             LAM/MPI: 7.1.2

Segmentation fault

jgche: ~/lam-test\>

no output in the directory.

In the cluster myrinet is installed, however, I did not use gm option to compile lam. I have also tried to compile it with gm swiches, the problem is the same.

JG
  ----- Original Message -----
  From: Jeff Squyres (jsquyres)
  To: General LAM/MPI mailing list
  Sent: Friday, April 28, 2006 8:17 PM
  Subject: Re: LAM: can gcc 3.2 and kernel 2.4.20 suit lam-7.1.2 or not? Or other problem for lam-7.1.2?

  Where gdb stops and gives you the "(gdb)" prompt, type "bt" and hit enter. This will give us a backtrace and show us exactly where it stopped.

  Can you send the output of laminfo? If laminfo fails to run, can you configure LAM with the following configure switch: --with-memory-manager=none. This *feels* like a memory manager problem, but the environment you listed should not be a problem (gcc 3.2, kernel 2.4.20). Are you using a high speed network such as Myrinet or Infiniband?

----------------------------------------------------------------------------
    From: lam-bounces_at_[hidden] [mailto:lam-bounces_at_[hidden]] On Behalf Of J G Che
    Sent: Friday, April 28, 2006 1:08 AM
    To: General LAM/MPI mailing list
    Subject: Re: LAM: can gcc 3.2 and kernel 2.4.20 suit lam-7.1.2 or not? Or other problem for lam-7.1.2?

    Thanks! I tried:

    jgche: ~/lam-test\>lamboot

    Segmentation fault

    jgche: ~/lam-test\>gdb lamboot

    GNU gdb Red Hat Linux (5.2.1-4)

    Copyright 2002 Free Software Foundation, Inc.

    GDB is free software, covered by the GNU General Public License, and you are

    welcome to change it and/or distribute copies of it under certain conditions.

    Type "show copying" to see the conditions.

    There is absolutely no warranty for GDB. Type "show warranty" for details.

    This GDB was configured as "i386-redhat-linux"...

    (gdb) run

    Starting program: /people/jgche/lam-7.1.2-debug/bin/lamboot

    [New Thread 8192 (LWP 24731)]

     

    Program received signal SIGTRAP, Trace/breakpoint trap.

    [Switching to Thread 8192 (LWP 24731)]

    0x00000000 in ?? ()

    (gdb)

    I don't know how to go on? Could you please give me more detail?

    I have also tried to install lam-7.0.4, fault with the same reason, "Segmentation fault".

    JG

      ----- Original Message -----
      From: Jeff Squyres (jsquyres)
      To: General LAM/MPI mailing list
      Sent: Thursday, April 27, 2006 7:17 PM
      Subject: Re: LAM: can gcc 3.2 and kernel 2.4.20 suit lam-7.1.2 or not? Or other problem for lam-7.1.2?

      This is certainly quite odd and should not happen.

      Can you try running "lamboot -d lamhosts" with 7.1.2? That might give a bit more output.

      If that doesn't reveal anything useful, could you recompile LAM with debugging symbols enabled (e.g., "./configure CFLAGS=-g ...."), ensure that your coredumpsize is unlimited, and run it again? This should then generate a corefile -- if you could send the backtrace from that, it would be most useful.

      Thanks!

------------------------------------------------------------------------
        From: lam-bounces_at_[hidden] [mailto:lam-bounces_at_[hidden]] On Behalf Of J G Che
        Sent: Thursday, April 27, 2006 1:45 AM
        To: General LAM/MPI mailing list
        Subject: LAM: can gcc 3.2 and kernel 2.4.20 suit lam-7.1.2 or not? Or other problem for lam-7.1.2?

        I cannot install lam-7.1.2 on our cluster with dual Xeon and myrinet. Its gcc version is:

        jgche: ~\>gcc -v
        Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/3.2/specs
        Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --disable-checking --host=i386-redhat-linux --with-system-zlib --enable-__cxa_atexit
        Thread model: posix
        gcc version 3.2 20020903 (Red Hat Linux 8.0 3.2-7)

        its kernel seems to be 2.4.20-28.8smp (I'm not a manager, who will not install lam-mpi, thus I want install for myself)

        I compiled lam-7.1.2 without problem, please see the attached config.7.1.2.log and make.7.1.2.log. However, when I run lamboot, I got

        jgche: ~\>cat lamhosts
        admin1
        jgche: ~\>lamboot -v lamhosts
        Segmentation fault
        jgche: ~\>

        Except for mpif77, mpicc, mpic++, if I excuted any other excutable files in /people/jgche/lam-7.1.2-eth/bin, I got "Segmentation fault"! I cannot fix the problem. Thus, I tried to install lam-6.5.7, since I thought this version was released in Oct 2002, almost the same time as that of gcc 3.2. And now it seemed to be ok.

        jgche: ~\>rm lam-eth
        jgche: ~\>ln -s lam-6.5.7-eth/ lam-eth
        jgche: ~\>lamboot -v lamhosts

        LAM 6.5.7/MPI 2 C++/ROMIO - Indiana University

        Executing hboot on n0 (admin1 - 1 CPU)...
        topology done

        please refer also to the attached config.6.5.7.log and make.6.5.7.log.

        What is this problem? Is the gcc version problem? or kernel? or others? How can I fix the problem?

        Thanks!

        JG

--------------------------------------------------------------------------

      _______________________________________________
      This list is archived at http://www.lam-mpi.org/MailArchives/lam/

------------------------------------------------------------------------------

  _______________________________________________
  This list is archived at http://www.lam-mpi.org/MailArchives/lam/