LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2004-06-11 08:42:02


I notice that the man page for dlerror(3) on Linux says:

-----
If dlopen fails for any reason, it returns NULL. A human readable string
describing the most recent error that occurred from any of the dl routines
(dlopen, dlsym or dlclose) can be extracted with dlerror(). dlerror
returns NULL if no errors have occurred since initialization or since it
was last called.
-----

This tends to imply that it may not be proper to call dlerror() before
dlopen() -- the "...if no errors have occurred since intialization..."
part is what I'm keying from. "Initialization" is not defined, so it
could mean either process initialization or the dl library initialzation
(effectively, dlopen).

So if this is true, here's two guesses: you could be a) getting lucky with
prior versions of LAM, or b) there's some other part of the system that is
calling dlopen() before you call dlerror() (and therefore initializaing
the dl library for you).

On Thu, 10 Jun 2004, Richard Hadsell wrote:

> Richard Hadsell wrote:
>
>> ... The only workaround I have found is to not call dlerror(). It seems
>> that dlopen() works fine, but calls to dlerror crash. I'll try to
>> reproduce it for you with a simple program.
>
> (1) I have learned that the crash only occurs when dlerror() is called
> before any call to dlopen(). My workaround is to only use dlerror()
> after the first attempt to use dlopen(). Note that this problem does
> not occur in my non-MPI versions of the application, and it does not
> occur with lam-6.6b2.
>
> (2) Here is a simple reproducer for the bug. If you comment out the
> call to dlerror(), it will run fine.
>
> I configured lam-7.0.6 with these options:
>
> configure --prefix=/netDISKS/master/netmt/LINUX_X86/lamb/lam-7.0.6
> --with-exceptions --with-exflags=" " --without-fc --with-signal=SIGURG
> --with-trillium
>
> and with these environment variables:
>
> CC=icc
> CXX=icc
> LDFLAGS=-Wl,-rpath,/usr/local/intel/compiler80/ia32/lib
> CXXLDFLAGS=-Wl,-rpath,/usr/local/intel/compiler80/ia32/lib
>
> The Intel compiler is not quite the latest:
>
> Intel(R) C++ Compiler for 32-bit applications, Version 8.0 Build 20040304Z
> Package ID: l_cc_pc_8.0.058_pe061
>
> The Linux is Red Hat 7.3 with kernel 2.4.18-18.7.x and glibc-2.2.5-42.
>
> 65% cat LamMpiBug1.cc
> #include <dlfcn.h>
> #include <iostream>
> #include <mpi.h>
>
> int main (int argc, const char * const *argv)
> {
> std::cout << "calling MPI_Init" << std::endl;
> if (MPI_Init (&argc, (char***) &argv) != MPI_SUCCESS)
> std::cout << "MPI_Init failed" << std::endl;
> std::cout << "calling dlerror" << std::endl;
> dlerror ();
> std::cout << "returned from dlerror" << std::endl;
> MPI_Finalize ();
> return 0;
> }
> 66% laminfo
> LAM/MPI: 7.0.6
> Prefix: /netDISKS/master/netmt/LINUX_X86/lamb/lam-7.0.6
> Architecture: i686-pc-linux-gnu
> Configured by: hadsell
> Configured on: Wed Jun 9 13:56:39 EDT 2004
> Configure host: carman
> C bindings: yes
> C++ bindings: yes
> Fortran bindings: no
> C profiling: yes
> C++ profiling: yes
> Fortran profiling: no
> ROMIO support: yes
> IMPI support: no
> Debug support: no
> Purify clean: no
> SSI boot: globus (Module v0.5)
> SSI boot: rsh (Module v1.0)
> SSI coll: lam_basic (Module v7.0)
> SSI coll: smp (Module v1.0)
> SSI rpi: crtcp (Module v1.0.1)
> SSI rpi: lamd (Module v7.0)
> SSI rpi: sysv (Module v7.0)
> SSI rpi: tcp (Module v7.0)
> SSI rpi: usysv (Module v7.0)
> 67% cat lamhosts carman
> ava
> 68% lamboot -x -s -H -b lamhosts -v
> n-1<512> ssi:boot:base:linear: booting n0 (carman)
> n-1<512> ssi:boot:base:linear: booting n1 (ava)
> n-1<512> ssi:boot:base:linear: finished
> 69% mpiCC -g -ldl -o ~/bin/LINUX_X86/LamMpiBug1 LamMpiBug1.cc
> 70% mpirun -np 2 -ssi rpi tcp -nsigs -O -pty LamMpiBug1
> calling MPI_Init
> calling MPI_Init
> calling dlerror
> calling dlerror
> -----------------------------------------------------------------------------
> One of the processes started by mpirun has exited with a nonzero exit
> code. This typically indicates that the process finished in error.
> If your process did not finish in error, be sure to include a "return
> 0" or "exit(0)" in your C code before exiting the application.
>
> PID 527 failed on node n0 (10.1.13.116) due to signal 11.
> -----------------------------------------------------------------------------
> 71% lamhalt
>
> LAM 7.0.6/MPI 2 C++/ROMIO - Indiana University
>
> 72%
>
>
>

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/