LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Bailey, Richard T (US SSA) (richard.t.bailey_at_[hidden])
Date: 2003-10-02 13:27:26


LAM 7.0 has some problems with pthreads. See for example the discussion
"MPI_Irecv() blocks when called from a pthread". I solved my problem by
upgrading to LAM 7.0.2

-----Original Message-----
From: Luke Palmer [mailto:lop_at_[hidden]]
Sent: Thursday, October 02, 2003 9:20 AM
To: lam_at_[hidden]
Subject: LAM: Program crash with SIGUSR2

Hello,

We have an MPI program that keeps crashing. At first it segfaulted
sometimes when it was run, so we compiled it with debugging symbols,
runtime error checking, and no optimization. We monitored it by
starting gdb for each rank. One of the ranks will always die, with this
message:

Program received signal SIGUSR2, User defined signal 2.
[Switching to Thread 16384 (LWP 4518)]
0x4011a698 in read () from /lib/i686/libpthread.so.0

(gdb) where

#0 0x4011a698 in read () from /lib/i686/libpthread.so.0
#1 0x400b744c in __dtors_list_end() from /usr/lib/libmpi.so.0

Our setup is Redhat 9 with lam 7.0 built with intel compilers on SMP
Xeon boxes.

Now, I have read that LAM uses SIGUSR2 internally, and that pthreads
does as well. Does anyone know what could be causing this problem, and
how we could fix it?

Thanks
-Luke

_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/