Richard Hadsell wrote:
> ... The only workaround I have found is to not call dlerror(). It
> seems that dlopen() works fine, but calls to dlerror crash. I'll try
> to reproduce it for you with a simple program.
(1) I have learned that the crash only occurs when dlerror() is called
before any call to dlopen(). My workaround is to only use dlerror()
after the first attempt to use dlopen(). Note that this problem does
not occur in my non-MPI versions of the application, and it does not
occur with lam-6.6b2.
(2) Here is a simple reproducer for the bug. If you comment out the
call to dlerror(), it will run fine.
I configured lam-7.0.6 with these options:
configure --prefix=/netDISKS/master/netmt/LINUX_X86/lamb/lam-7.0.6 --with-exceptions --with-exflags=" " --without-fc --with-signal=SIGURG --with-trillium
and with these environment variables:
CC=icc
CXX=icc
LDFLAGS=-Wl,-rpath,/usr/local/intel/compiler80/ia32/lib
CXXLDFLAGS=-Wl,-rpath,/usr/local/intel/compiler80/ia32/lib
The Intel compiler is not quite the latest:
Intel(R) C++ Compiler for 32-bit applications, Version 8.0 Build 20040304Z Package ID: l_cc_pc_8.0.058_pe061
The Linux is Red Hat 7.3 with kernel 2.4.18-18.7.x and glibc-2.2.5-42.
65% cat LamMpiBug1.cc
#include <dlfcn.h>
#include <iostream>
#include <mpi.h>
int main (int argc, const char * const *argv)
{
std::cout << "calling MPI_Init" << std::endl;
if (MPI_Init (&argc, (char***) &argv) != MPI_SUCCESS)
std::cout << "MPI_Init failed" << std::endl;
std::cout << "calling dlerror" << std::endl;
dlerror ();
std::cout << "returned from dlerror" << std::endl;
MPI_Finalize ();
return 0;
}
66% laminfo
LAM/MPI: 7.0.6
Prefix: /netDISKS/master/netmt/LINUX_X86/lamb/lam-7.0.6
Architecture: i686-pc-linux-gnu
Configured by: hadsell
Configured on: Wed Jun 9 13:56:39 EDT 2004
Configure host: carman
C bindings: yes
C++ bindings: yes
Fortran bindings: no
C profiling: yes
C++ profiling: yes
Fortran profiling: no
ROMIO support: yes
IMPI support: no
Debug support: no
Purify clean: no
SSI boot: globus (Module v0.5)
SSI boot: rsh (Module v1.0)
SSI coll: lam_basic (Module v7.0)
SSI coll: smp (Module v1.0)
SSI rpi: crtcp (Module v1.0.1)
SSI rpi: lamd (Module v7.0)
SSI rpi: sysv (Module v7.0)
SSI rpi: tcp (Module v7.0)
SSI rpi: usysv (Module v7.0)
67% cat lamhosts
carman
ava
68% lamboot -x -s -H -b lamhosts -v
n-1<512> ssi:boot:base:linear: booting n0 (carman)
n-1<512> ssi:boot:base:linear: booting n1 (ava)
n-1<512> ssi:boot:base:linear: finished
69% mpiCC -g -ldl -o ~/bin/LINUX_X86/LamMpiBug1 LamMpiBug1.cc
70% mpirun -np 2 -ssi rpi tcp -nsigs -O -pty LamMpiBug1
calling MPI_Init
calling MPI_Init
calling dlerror
calling dlerror
-----------------------------------------------------------------------------
One of the processes started by mpirun has exited with a nonzero exit
code. This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.
PID 527 failed on node n0 (10.1.13.116) due to signal 11.
-----------------------------------------------------------------------------
71% lamhalt
LAM 7.0.6/MPI 2 C++/ROMIO - Indiana University
72%
--
Dick Hadsell 914-259-6320 Fax: 914-259-6499
Reply-to: hadsell_at_[hidden]
Blue Sky Studios http://www.blueskystudios.com
44 South Broadway, White Plains, NY 10601
|