Based on your stack trace, it looks like something is getting
corrupted. Unfortunately, I do not have access to any FreeBSD machines
to test this myself. However, laminfo is clearly not getting very far
-- can you step/next through from the beginning of the program to see
where exactly it is failing?
On Nov 20, 2005, at 10:49 AM, <v.kuznetsov_at_[hidden]> wrote:
> Hi Jeff,
> Thank you for your answer,
> Below is a result of GDB-ing, I just had to say some about
> environment. It
> is a stable production system with 1 month uptime and it's almost
> impossible
> to be a problem in mulfunctioning libc library. But.. you should know
> better
> :). Moreover, I tried LAM-MPI on 2 different systems (build on a same
> distibution, however) and have the same - laminfo hangs on.
> And here what is with laminfo GDB:
>
> %gdb laminfo
> GNU gdb 4.18 (FreeBSD)
> Copyright 1998 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and
> you are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB. Type "show warranty" for
> details.
> This GDB was configured as "i386-unknown-freebsd"...Deprecated bfd_read
> called
> t
> /usr/src/gnu/usr.bin/binutils/gdb/../../../../contrib/gdb/gdb/
> dbxread.c
> line
> 627 in elfstab_build_psymtabs
> Deprecated bfd_read called at
> /usr/src/gnu/usr.bin/binutils/gdb/../../../../con
> rib/gdb/gdb/dbxread.c line 933 in fill_symbuf
>
> (gdb) run
> Starting program: /usr/local/bin/laminfo
> ^C
> Program received signal SIGINT, Interrupt.
> 0x48197900 in __sys_poll () from /usr/lib/libc_r.so.4
> (gdb) bt
> #0 0x48197900 in __sys_poll () from /usr/lib/libc_r.so.4
> #1 0x48196e4c in _thread_kern_sched_state_unlock () from
> /usr/lib/libc_r.so.4
> #2 0x48196811 in _thread_kern_scheduler () from /usr/lib/libc_r.so.4
> #3 0x0 in ?? ()
> (gdb)
>
>> -----Original Message-----
>> From: lam-bounces_at_[hidden] [mailto:lam-bounces_at_[hidden]] On
>> Behalf
>> Of Jeff Squyres
>> Sent: Saturday, November 19, 2005 4:48 AM
>> To: General LAM/MPI mailing list
>> Subject: Re: LAM: FW: FreeBSD MPI_INIT problem
>>
>> Sorry for the delay -- SC kept us quite busy over the last 2 weeks.
>> :-(
>>
>> Hum. This is quite puzzling; I'm concerned that even laminfo hangs
>> (I've never seen that before). Indeed, laminfo is not even an MPI
>> process -- there's not a whole lot that's going on in there.
>>
>> Can you attach to laminfo and see where, exactly, it is hanging? If
>> you get a corrupted stack, can you run laminfo in gdb directly and see
>> where it gets fouled up?
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/
|