Sorry about the slow reply -- LAM/MPI is unfortunately a side project
for me these days, and things slip through the cracks.
It looks like there's something bad happening between the memory
manager code in LAM/MPI and the Intel compilers on your platform. If
you aren't using Myrinet, the memory manager's really just in the way,
so I'd recommend trying to rebuild LAM/MPI without it (use the --
without-memory-manager option to configure). That hopefully will
clear things up for you.
Brian
On Nov 26, 2007, at 2:53 PM, GQ Chen wrote:
> Here I give some snapshots from GDB debuger (I am running the program
> in the local node) I set the breakpoint before function MPI_Init
>
> baggins3,guchen $ ~/package/lam_7.1.4_intel_10.0.23/bin/mpicc -g
> hello.c -o hello_g
> baggins3,guchen $ gdb ./hello_g
> GNU gdb Red Hat Linux (6.3.0.0-1.63rh)
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and
> you are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB. Type "show warranty" for
> details.
> This GDB was configured as "x86_64-redhat-linux-gnu"...Using host
> libthread_db library "/lib64/tls/libthread_db.so.1".
>
> (gdb) set width 70
> (gdb) break hello_g_MPI_Init
> Function "hello_g_MPI_Init" not defined.
> Make breakpoint pending on future shared library load? (y or [n])
> (gdb) break MPI_Init
> Breakpoint 1 at 0x804ac7c
> (gdb) run
> Starting program: /home/guchen/packages/lam-7.0.6/examples/hello/
> hello_g
> Reading symbols from shared object read from target memory...done.
> Loaded system supplied DSO at 0xffffe000
> [Thread debugging using libthread_db enabled]
> [New Thread 4160730816 (LWP 16152)]
> hello1
> [Switching to Thread 4160730816 (LWP 16152)]
>
> Breakpoint 1, 0x0804ac7c in MPI_Init ()
> (gdb) n
> Single stepping until exit from function MPI_Init,
> which has no line number information.
> 0x0805bbea in lam_setfunc ()
> (gdb) n
> Single stepping until exit from function lam_setfunc,
> which has no line number information.
> 0x080905f4 in lam_arr_init ()
> (gdb) n
> Single stepping until exit from function lam_arr_init,
> which has no line number information.
> 0x080618c6 in malloc ()
> (gdb) n
> Single stepping until exit from function malloc,
> which has no line number information.
> 0x080618ca in malloc. ()
> (gdb) n
> Single stepping until exit from function malloc.,
> which has no line number information.
> 0x080610e4 in malloc_hook_ini ()
> (gdb) n
> Single stepping until exit from function malloc_hook_ini,
> which has no line number information.
> 0x0804a0ec in ?? ()
> (gdb) n
> Cannot find bounds of current function
> (gdb) n
> Cannot find bounds of current function
> (gdb) n
> Cannot find bounds of current function
> (gdb) bt
> #0 0x0804a0ec in ?? ()
> #1 0x0806113c in malloc_hook_ini ()
> #2 0x00a5a4f8 in ?? ()
> #3 0x00000001 in ?? ()
> #4 0xffffd124 in ?? ()
> #5 0x08061b83 in malloc. ()
> #6 0x00b82ff4 in ?? () from /lib/tls/libc.so.6
> #7 0xffffd098 in ?? ()
> #8 0xffffd124 in ?? ()
> #9 0xffffd0b0 in ?? ()
> #10 0x080905fc in lam_arr_init ()
> #11 0xffffd098 in ?? ()
> #12 0x0805bc6f in lam_setfunc ()
> #13 0x00000004 in ?? ()
> #14 0x00000000 in ?? ()
> (gdb) s
> Cannot find bounds of current function
>
>
>
> On Nov 26, 2007 9:27 AM, Brian W. Barrett <brbarret_at_[hidden]>
> wrote:
>> If you can attach a debugger to one of the MPI processes that is
>> "hung",
>> that might be useful. Unfortuantely, when it works with one
>> compiler and
>> not the other, those are usually pretty hard bugs to find.
>>
>> Brian
>>
>>
>> On Mon, 26 Nov 2007, GQ Chen wrote:
>>
>>> Brian,
>>>
>>> I downloaded Lam 7.1.4 and compiled with Intel ICC 10.0.23.
>>> Still the same problem as Lam 7.0.6. It just hangs on before
>>> MPI_Init.
>>> Any thoughts?? Thanks,
>>>
>>> Guoquan
>>>
>>> On Nov 25, 2007 11:00 PM, Brian Barrett <brbarret_at_[hidden]>
>>> wrote:
>>>> On Nov 25, 2007, at 6:17 PM, GQ Chen wrote:
>>>>
>>>>> I have AMD Opteron system with dual CPUs, the OS is installed
>>>>> with
>>>>> CentOS4.2 with x86_64, Linux Kernel is 2.6.9. I downloaded Lam
>>>>> 7.0.6
>>>>> and compiled it with Intel compiler 10.0.23 with the following
>>>>> configuration
>>>>
>>>> Can you try upgrading to LAM/MPI 7.1.4? There are some known
>>>> issues
>>>> with 7.0.6 and it is no longer supported.
>>>>
>>>> thanks,
>>>>
>>>> Brian
>>>>
>>>> --
>>>> Brian Barrett
>>>> LAM/MPI Developer
>>>> Make today a LAM/MPI day!
>>>>
>>>>
>>>> _______________________________________________
>>>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>>>
>>> _______________________________________________
>>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>>
>>>
>>
>> --
>>
>> Brian Barrett
>> LAM/MPI Developer
>> Make today a LAM/MPI day!
>> _______________________________________________
>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
--
Brian Barrett
LAM/MPI Developer
Make today a LAM/MPI day!
|