LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Hugh Merz (merz_at_[hidden])
Date: 2005-03-30 12:19:45


Is the file "bt.dat" viewable from all of the nodes (either on a network
shared disk, or copied to the same location on each node)? Signal 6 is a
general abort.

You can find the version of the mpif.h file by looking at it in a file
editor, if it is the LAM mpif.h it will be clearly marked as such.

Hugh

On Wed, 30 Mar 2005, Srinivasa Prade Patri wrote:

> Hi!
> I reinstalled the BLACS library with previous macro settings and it is still not working. But this time iam not getting the invalid commincator error but different ones. I ran the exe files on 4 nodes and here are the errors...
>
> ------------------------------------------------------------------------------------- [patri_at_e01 ~]$ mpirun -v -np 4 /home/BLACS/TESTING/EXE/xFbtest_MPI-LINUX-0
> 1647 /home/BLACS/TESTING/EXE/xFbtest_MPI-LINUX-0 running on n0 (o)
> 1225 /home/BLACS/TESTING/EXE/xFbtest_MPI-LINUX-0 running on n1
> 1219 /home/BLACS/TESTING/EXE/xFbtest_MPI-LINUX-0 running on n2
> 1219 /home/BLACS/TESTING/EXE/xFbtest_MPI-LINUX-0 running on n3
> BLACS WARNING 'No need to set message ID range due to MPI communicator.'
> from {-1,-1}, pnum=0, Contxt=-1, on line 18 of file 'blacs_set_.c'.
>
> BLACS WARNING 'No need to set message ID range due to MPI communicator.'
> from {-1,-1}, pnum=1, Contxt=-1, on line 18 of file 'blacs_set_.c'.
>
> BLACS WARNING 'No need to set message ID range due to MPI communicator.'
> from {-1,-1}, pnum=2, Contxt=-1, on line 18 of file 'blacs_set_.c'.
>
> BLACS WARNING 'No need to set message ID range due to MPI communicator.'
> from {-1,-1}, pnum=3, Contxt=-1, on line 18 of file 'blacs_set_.c'.
>
> open: No such file or directory
> apparent state: unit 11 named bt.dat
> lately writing direct unformatted external IO
> -----------------------------------------------------------------------------
> One of the processes started by mpirun has exited with a nonzero exit
> code. This typically indicates that the process finished in error.
> If your process did not finish in error, be sure to include a "return
> 0" or "exit(0)" in your C code before exiting the application.
>
> PID 1647 failed on node n0 (10.5.0.1) due to signal 6.
> -----------------------------------------------------------------------------
> MPI_Recv: process in local group is dead (rank 1, MPI_COMM_WORLD)
> Rank (1, MPI_COMM_WORLD): Call stack within LAM:
> Rank (1, MPI_COMM_WORLD): - MPI_Recv()
> Rank (1, MPI_COMM_WORLD): - MPI_Allreduce()
> Rank (1, MPI_COMM_WORLD): - MPI_Comm_dup()
> Rank (1, MPI_COMM_WORLD): - main()
> MPI_Recv: process in local group is dead (rank 2, MPI_COMM_WORLD)
> Rank (2, MPI_COMM_WORLD): Call stack within LAM:
> Rank (2, MPI_COMM_WORLD): - MPI_Recv()
> Rank (2, MPI_COMM_WORLD): - MPI_Allreduce()
> Rank (2, MPI_COMM_WORLD): - MPI_Comm_dup()
> Rank (2, MPI_COMM_WORLD): - main()
> MPI_Recv: process in local group is dead (rank 3, MPI_COMM_WORLD)
> Rank (3, MPI_COMM_WORLD): Call stack within LAM:
> Rank (3, MPI_COMM_WORLD): - MPI_Recv()
> Rank (3, MPI_COMM_WORLD): - MPI_Allreduce()
> Rank (3, MPI_COMM_WORLD): - MPI_Comm_dup()
> Rank (3, MPI_COMM_WORLD): - main()
> -------------------------------------------------------------------------------------
>
> What i observe is process in all nodes excep rank 0 are dead.
> I have attached the Bmake.inc file with this mail.
>
> Thanking You
>
> Regards
> Srinivasa Patri
>
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>