It looks like you might have a messed-up installation. You have several
include files around the place, and you're getting different errors now.
Why don't you try and get to a stable base point to start with? The
includes in /usr/include might have come from part of your standard
*nix/*nux installation, depending on where you installed LAM (where did
you install it?). Get rid of as many of the other LAM or MPI pieces on
your system (if it's your system).
Reconfigure LAM with a unique installation location that's not part of the
standard system, like /usr/local/lam-7.1.1 or /home/srini/lam-7.1.1
"./configure --prefix=/usr/local/lam-7.1.1" etc. Then rebuild BLACS
pointing at that particular version of LAM. This won't take long, and
you'll know you're using just one LAM. Then see if you can get repeatable
behaviour.
Damien
> Is the file "bt.dat" viewable from all of the nodes (either on a network
> shared disk, or copied to the same location on each node)? Signal 6 is a
> general abort.
>
> You can find the version of the mpif.h file by looking at it in a file
> editor, if it is the LAM mpif.h it will be clearly marked as such.
>
> Hugh
>
> On Wed, 30 Mar 2005, Srinivasa Prade Patri wrote:
>
>> Hi!
>> I reinstalled the BLACS library with previous macro settings and it
>> is still not working. But this time iam not getting the invalid
>> commincator error but different ones. I ran the exe files on 4
>> nodes and here are the errors...
>>
>> -------------------------------------------------------------------------------------
>> [patri_at_e01 ~]$ mpirun -v -np 4
>> /home/BLACS/TESTING/EXE/xFbtest_MPI-LINUX-0
>> 1647 /home/BLACS/TESTING/EXE/xFbtest_MPI-LINUX-0 running on n0 (o)
>> 1225 /home/BLACS/TESTING/EXE/xFbtest_MPI-LINUX-0 running on n1
>> 1219 /home/BLACS/TESTING/EXE/xFbtest_MPI-LINUX-0 running on n2
>> 1219 /home/BLACS/TESTING/EXE/xFbtest_MPI-LINUX-0 running on n3
>> BLACS WARNING 'No need to set message ID range due to MPI communicator.'
>> from {-1,-1}, pnum=0, Contxt=-1, on line 18 of file 'blacs_set_.c'.
>>
>> BLACS WARNING 'No need to set message ID range due to MPI communicator.'
>> from {-1,-1}, pnum=1, Contxt=-1, on line 18 of file 'blacs_set_.c'.
>>
>> BLACS WARNING 'No need to set message ID range due to MPI communicator.'
>> from {-1,-1}, pnum=2, Contxt=-1, on line 18 of file 'blacs_set_.c'.
>>
>> BLACS WARNING 'No need to set message ID range due to MPI communicator.'
>> from {-1,-1}, pnum=3, Contxt=-1, on line 18 of file 'blacs_set_.c'.
>>
>> open: No such file or directory
>> apparent state: unit 11 named bt.dat
>> lately writing direct unformatted external IO
>> -----------------------------------------------------------------------------
>> One of the processes started by mpirun has exited with a nonzero exit
>> code. This typically indicates that the process finished in error.
>> If your process did not finish in error, be sure to include a "return
>> 0" or "exit(0)" in your C code before exiting the application.
>>
>> PID 1647 failed on node n0 (10.5.0.1) due to signal 6.
>> -----------------------------------------------------------------------------
>> MPI_Recv: process in local group is dead (rank 1, MPI_COMM_WORLD)
>> Rank (1, MPI_COMM_WORLD): Call stack within LAM:
>> Rank (1, MPI_COMM_WORLD): - MPI_Recv()
>> Rank (1, MPI_COMM_WORLD): - MPI_Allreduce()
>> Rank (1, MPI_COMM_WORLD): - MPI_Comm_dup()
>> Rank (1, MPI_COMM_WORLD): - main()
>> MPI_Recv: process in local group is dead (rank 2, MPI_COMM_WORLD)
>> Rank (2, MPI_COMM_WORLD): Call stack within LAM:
>> Rank (2, MPI_COMM_WORLD): - MPI_Recv()
>> Rank (2, MPI_COMM_WORLD): - MPI_Allreduce()
>> Rank (2, MPI_COMM_WORLD): - MPI_Comm_dup()
>> Rank (2, MPI_COMM_WORLD): - main()
>> MPI_Recv: process in local group is dead (rank 3, MPI_COMM_WORLD)
>> Rank (3, MPI_COMM_WORLD): Call stack within LAM:
>> Rank (3, MPI_COMM_WORLD): - MPI_Recv()
>> Rank (3, MPI_COMM_WORLD): - MPI_Allreduce()
>> Rank (3, MPI_COMM_WORLD): - MPI_Comm_dup()
>> Rank (3, MPI_COMM_WORLD): - main()
>> -------------------------------------------------------------------------------------
>>
>> What i observe is process in all nodes excep rank 0 are dead.
>> I have attached the Bmake.inc file with this mail.
>>
>> Thanking You
>>
>> Regards
>> Srinivasa Patri
>>
>>
>>
>> _______________________________________________
>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
|