LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Nick Nevin (njnevin_at_[hidden])
Date: 2003-07-22 22:39:48


On Tue, Jul 22, 2003 at 01:10:39PM -0400, Rodrigo Amestica wrote:
> I have a very simple test that I run after installing an MPI
> implementation. This program installs a handler on signal SIGUSR1. The
> handler from rank=0 sends a message (MPI_Issend) to rank=1 and from
> rank=1 it sends a message to rank=0. The main routine of the program is
> normally pending on MPI_Recv.
>
> This program uses to work under MPICH 0.93, but on LAM 6.5.9 the
> MPI_Recv seems to hang after receiving the very first message.
>
> Did I select the wrong LAM version or the above is simply a not
> supported scenario?
>
> Thanks,
> Rodrigo
>
> ps: my LAM 6.5.9 configuration was this (redhat 7.2, gcc = 2.96)
> ./configure --with-threads --enable-shared=yes --without-mpi2cpp
> --without-fc --without-romio --prefix=/export/data/ramestic/alma/lam |
> tee c.log

LAM uses one of SIGUSR1 or SIGUSR2 internally. It's possible that you
are clashing with this but even if not see below. I believe that from
6.5.9 onwards LAM by default uses SIGUSR2. You can check which signal
your version is using for the LAM internal signal by looking at the
value of LAM_SIGUSR in lam_config.h.

That said, the calling of MPI functions in LAM (and I would expect in
most MPI implementations) inside of signal handlers is definitely not
supported. MPI aside in general the only operations that can be called
inside signal handlers are those that are async-signal-safe (e.g. most
system calls). This excludes things like malloc or anything that may
lead to malloc being called. The fact that your code works with MPICH
is probably just blind luck and you shouldn't count on it always working.

-nick