Hi,
I'm running fairly straighforward program for computing solutions of
parabolic PDEs using finite differences. From LAM-MPI I use only the Send,
Recv and Sendrecv functions. There is one problem and two questions:
Problem: running this program in parallel with mpirun (without -nsigs)
causes SIGFPE in all nodes except the n0 outputing the following message
(or similar, depending on number of nodes):
MPI process rank 3 (n3, p8344) caught a SIGFPE.
MPI process rank 1 (n1, p8820) caught a SIGFPE.
MPI process rank 2 (n2, p9553) caught a SIGFPE.
OR prints one of the following messages if running the program with
mpirun -nsigs:
---
MPI_Wait: process in local group is dead (rank 0, MPI_COMM_WORLD)
Rank (0, MPI_COMM_WORLD): Call stack within LAM:
Rank (0, MPI_COMM_WORLD): - MPI_Wait()
Rank (0, MPI_COMM_WORLD): - MPI_Sendrecv()
Rank (0, MPI_COMM_WORLD): - main()
One of the processes ...
---
MPI_Wait: process in local group is dead (rank 0, MPI_COMM_WORLD)
One of the processes ...
--- (or only: )
One of the processes started by mpirun has exited with a nonzero exit
code. This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.
PID 9543 failed on node n2 with exit status 1.
---
Running this program serially works perfectly with expected results
without any FPE. I encounter the same problem running this program both on
8 node pentium 4 linux cluster with LAM 6.5.6 as well as on single p4
linux workstation with LAM 6.5.8 with one cpu (and putting for example
cpu=2 in the hostfile).
Question 1: Using this information, can anybody guess where the problem
might be? Since serially the computation runs fine, I guess the problem
will not be in my program but in the way I use LAM. But on the other hand,
sending data betweens comps causes FPE?
Question 2: I'm using Debian 3.0 with default compiler 3.3.2 However, with
this this compiler I cannot link LAM programs, while with 2.95 I can (Am I
right there is binary incompatibility between these versions of gcc?).
mpiCC uses g++. Therefore, I have to compile my programs with g++-2.95 and
manually put the include directories and libraries. It works, but its a
bit inconvenient. Is there a way how to tell mpiCC which compiler to use?
As far as I can read the man page, the only parameter mpiCC has is
-showme.
Thanks for advices and hints
Vladimir
|