On Jun 11, 2004, at 1:47 PM, Richard Hadsell wrote:
> Richard Hadsell wrote:
>
>> I'll try to build a -g version of libdl, so that I can step through
>> the dlerror call. I don't know how to do that, yet, but I'll try to
>> get some help.
>
> This may be my last contribution on this thread, because I'm stuck. I
> built a debug version of libdl and stepped into the code.
>
> The call to __libc_getspecific on line 53 returns a valid pointer.
> The dl_action_result struct has 0 for data members errcode, returned,
> and objname, but its errstring pointer is bad (0x00000089). The seg
> fault happens after it goes into the call to __asprintf on lines 71-73
> with the bad pointer in buf.
>
> I can't step into __libc_getspecific, even with a debug version of the
> pthreads library. I have no idea where it is going, and I probably
> couldn't figure out what's happening with the thread-specific data
> anyway. So I'm stuck.
>
> I still think there might be a problem in LAM code somewhere. Does
> anyone know whether mpirun or lamd use any thread-specific data? It's
> way beyond me at the moment.
The lightbulb goes on!
LAM 7.0, in anticipation for a thread hot MPI, enables thread support
by default. We only support THREAD_SINGLE to THREAD_SERIALIZED, but
that's enough to require some locking support. So LAM 7.0 links in all
the pthread code (and mpicc/mpiCC/mpif7 add them as well). Any version
of LAM previous to LAM 7.0 probably would not have done this.
You might try compiling LAM 7.0 with the --without-threads configure
flag and see if that solves the problem. That would lead me to believe
that the issue is just a difference in behavior of the dl library when
there is pthread support linked in.
Brian
--
Brian Barrett
LAM/MPI developer and all around nice guy
Have a LAM/MPI day: http://www.lam-mpi.org/
|