LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2005-08-22 17:09:12


FWIW, we use a different memory manager scheme for Open MPI -- we use
the built-in glibc hooks if available (and not threading). We use a
slightly different scheme if threading is used, but there should still
be no collisions.

On Aug 22, 2005, at 5:56 PM, John Robinson wrote:

> Hi all,
>
> I'm back. Our team has pretty much decided to move to dynamic link for
> now, as it seems to work fine for us. The SEGV problem looks like
> something introduced in gcc 4.0.0->4.0.1, and has something to do with
> pthreads and/or atexit() calls. We are pursuing that on the GCC list
> [gcc-help_at_[hidden]].
>
> The static link problem with memory manager collisions remains for LAM;
> disabling the private memory manager is a workaround if you are not a
> myrinet or infiniband shop.
>
> I expect that is all that need be said on this list. I will look into
> moving to Open MPI soon; expect to try out the new downloads.
> Definitely want MPI_THREAD_MULTIPLE long-term.
>
> /jr
> ---
> John Robinson wrote:
>> Hi Jeff,
>>
>> Thanks for the reply. We are still struggling with the SEGV problem,
>> but as I said it is independent of LAM/MPI.
>>
>> I have not pushed us to change to dynamic link but we may wind up
>> there.
>>
>> However, notice that even a trivial MPI program fails to link
>> statically
>> on my setup with the redhat FC4 rpm for lam. So there is a problem
>> to
>> be addressed, IMHO.
>>
>> I am going away for a week+ but will report back to the list when I
>> return.
>>
>> Thanks again!
>> /jr
>> ---
>> Jeff Squyres wrote:
>>
>>> Sorry for the delay in replying.
>>>
>>> This probably makes sense -- if you compile the rest of your code
>>> statically (and against libc.a), then malloc and friends are included
>>> in your executable. For lack of a longer explanation, I think it's
>>> easy to construct scenarios where the two memory managers run afoul
>>> of
>>> each other (or simply create linker clashes).
>>>
>>> Disabling the LAM memory manager is certainly an option here,
>>> especially if you never plan to use IB or GM.
>>>
>>> Is there a reason you need static linking?
>>>
>>>
>>> On Aug 10, 2005, at 4:18 PM, John Robinson wrote:
>>>
>>>
>>>
>>>> Dear lam users,
>>>>
>>>> My bad. I misinterpreted a change in my project which introduced
>>>> the
>>>> SEGV, since that change happened at roughly the same time that I
>>>> switched to statically-linked MPI. So only the first problem
>>>> remains -
>>>> the -static flag breaks unless you add -with-memory-manager=none to
>>>> the
>>>> config (and give up on ib or md SSIs).
>>>>
>>>> /jr
>>>> ---
>>>> John Robinson wrote:
>>>>
>>>>
>>>>> Hi lam users,
>>>>>
>>>>> Quick description:
>>>>>
>>>>> Static linking fails to link, with multiply-defined symbols (with
>>>>> MPI
>>>>> memory manager). Statically-linked test program segfaults in
>>>>> exit()
>>>>> with memory-manager=none.
>>>>>
>>>>> Long-winded tale of woe:
>>>>>
>>>>> I have been working on an MPI infrastructure, and ran into the a
>>>>> couple of problems. When trying to statically link (with mpiCC), I
>>>>> get a error from ld about symbols in libc being redefined, and
>>>>> libmpi.a is the culprit. So problem number 1 is I cannot
>>>>> statically
>>>>> link mpi apps, in this environment:
>>>>>
>>>>> FC4 / i686 / g++ (GCC) 4.0.1 20050727 (Red Hat 4.0.1-5)
>>>>>
>>>>> I figured that this must be due to the overloaded malloc package
>>>>> used
>>>>> to protect users against hardware memory stomping when using
>>>>> Infiniband or Myrinet, which I do not plan to use. So I took a
>>>>> deep
>>>>> breath, uninstalled the redhat lam distribution, and proceeded to
>>>>> download the sources and build lam/mpi myself with the following
>>>>> config:
>>>>>
>>>>> ./configure --disable-tv-queue --with-memory-manager=none
>>>>> --without-romio --with-trillium
>>>>>
>>>>> [I don't need ROMIO and thought I might want to experiment with
>>>>> building xmpi].
>>>>>
>>>>> At any rate, I can now link my program okay, but when I execute
>>>>> it, I
>>>>> get a SEGV out of exit:
>>>>>
>>>>> Program received signal SIGSEGV, Segmentation fault.
>>>>> 0x00000000 in ?? ()
>>>>> (gdb) where
>>>>> #0 0x00000000 in ?? ()
>>>>> #1 0x080be6bd in __tcf_0 ()
>>>>> #2 0x0812eb02 in exit ()
>>>>> #3 0x080482a2 in main (argc=1, argv=0xbfe20064)
>>>>>
>>>>> If I ask gdb to show me __tcf_0, however, it displays a different
>>>>> one.
>>>>> So it looks like the exit_funcs are getting messed up. The
>>>>> instruction that fails appears to be an incomplete link step [note
>>>>> the
>>>>> "call 0x0"]:
>>>>>
>>>>> 0x080be6a0 <__tcf_0+0>: push %ebp
>>>>> 0x080be6a1 <__tcf_0+1>: mov %esp,%ebp
>>>>> 0x080be6a3 <__tcf_0+3>: sub $0x8,%esp
>>>>> 0x080be6a6 <__tcf_0+6>: mov 0x81d3784,%ecx
>>>>> 0x080be6ac <__tcf_0+12>: test %ecx,%ecx
>>>>> 0x080be6ae <__tcf_0+14>: je 0x80be6cb <__tcf_0+43>
>>>>> 0x080be6b0 <__tcf_0+16>: mov 0x81d378c,%eax
>>>>> 0x080be6b5 <__tcf_0+21>: mov %eax,(%esp)
>>>>> 0x080be6b8 <__tcf_0+24>: call 0x0
>>>>> 0x080be6bd <__tcf_0+29>: mov 0x81d3784,%eax
>>>>> 0x080be6c2 <__tcf_0+34>: mov %eax,0x8(%ebp)
>>>>> 0x080be6c5 <__tcf_0+37>: leave
>>>>> 0x080be6c6 <__tcf_0+38>: jmp 0x81141dc <_ZdlPv>
>>>>> 0x080be6cb <__tcf_0+43>: leave
>>>>> 0x080be6cc <__tcf_0+44>: ret
>>>>> 0x080be6cd <__tcf_0+45>: nop
>>>>>
>>>>> All my test program does is try to instantiate a class that has
>>>>> some
>>>>> Intracomm members. If I do not instantiate it, the problem stops
>>>>> (or
>>>>> is masked). The same error happens whether I instantiate the class
>>>>> with "new" or declare it in main().
>>>>>
>>>>> I may be able to convince the rest of my project that dynamic
>>>>> linking
>>>>> is okay, but maybe that is just deferring a problem that will still
>>>>> crop up eventually. My test program did run its basic steps
>>>>> successfully when linked dynamically, but maybe I was just lucky.
>>>>>
>>>>> Has anyone got a fix for this? Or even seen it?
>>>>>
>>>>> Many thanks,
>>>>>
>>>>> John Robinson
>>>>> Vertica Systems
>>>>> _______________________________________________
>>>>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>>>>
>>>>
>>>> _______________________________________________
>>>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>>>
>>>
>>>
>> _______________________________________________
>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/