LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Srinivasa Prade Patri (Srinivasa.Patri_at_[hidden])
Date: 2005-03-30 20:14:04


HI!
     I have realized the mistake iam doing while running the BLACS test suite executables. Previously i am running the command from different directory that has the file "bt.dat". Thats reason iam getting error

" open: no such file are directory
apparent state: unit 11 named bt.dat"

    Now i ran the mpirun command where the bt.dat is located. And the test says passed . But i did not understand the last statement of the results of Auxilary test:

"The final auxiliary test is for BLACS_ABORT.
 Immediately after this message, all processes should be killed.
 If processes survive the call, your BLACS_ABORT is incorrect.
{0,11}, pnum=11, Contxt=0, killed other procs, exiting with error #-1.

-----------------------------------------------------------------------------
One of the processes started by mpirun has exited with a nonzero exit
code. This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.

PID 1743 failed on node n0 (10.5.0.1) with exit status 1.
-----------------------------------------------------------------------------

    Reading this i think the final test also has passed..,but want to make sure that the tests have been passed. I have attached the complete test results to this mail . I also see a warning ,and i dont know how important it is

"BLACS WARNING 'No need to set message ID range due to MPI communicator.' from {-1,-1}, pnum=0, Contxt=-1, on line 18 of file 'blacs_set_.c'. "

    I really thank you all for your wonderful suggestions and also apologize for bugging you all with lot of mails.

Thanking You

Regards
Srinivasa Patri

-----Original Message-----
From: Hugh Merz <merz_at_[hidden]>
To: sppatr2_at_[hidden], General LAM/MPI mailing list <lam_at_[hidden]>
Date: Wed, 30 Mar 2005 17:07:00 -0500 (EST)
Subject: Re: LAM: How to know "bt.dat" executable of BLACS tester is viewable to all nodes

Hello Srinivasa,

   If all the nodes can see the filesystem then there should not be a
problem. The error below contains the following message:

--snip--
open: No such file or directory
--/snip--

   Which would indicate that there is a problem finding a file. I'm not
familiar with vnfs - maybe it will not let multiple programs open the same
file at once? There seems to be very little information about this system
on the web. Can you run the blacs tester programs on just one node
successfully? That would rule out any installation problems.

   As someone has previously mentioned, you may be best off
reinstalling LAM into a new directory, and then going from there, as there
appears to be some confusion as to which files you are using to compile
the BLACS. Definately make sure you can compile simple MPI programs and
run them before going any further.

Hugh

On Wed, 30 Mar 2005, Srinivasa Prade Patri wrote:

> Hi!
> we have a VNFS file system over here. And all nodes can see the home directory where i installed the BLACS library. So iam thinking that all nodes can see the "bt.dat". Is installing the BLACS library is sufficient to make it work on all nodes? or do i need to reboot all the nodes. Iam not sure how to make BLACS library work on all nodes. How can i make sure that "bt.dat" is viewable.
>
> Thanking you.
>
> Regards
> Srinivasa Patri
> -----Original Message-----
> From: Hugh Merz <merz_at_[hidden]>
> To: sppatr2_at_[hidden], General LAM/MPI mailing list <lam_at_[hidden]>
> Date: Wed, 30 Mar 2005 12:19:45 -0500 (EST)
> Subject: Re: LAM: Errors running the TEST suite for BLACS library
>
> Is the file "bt.dat" viewable from all of the nodes (either on a network
> shared disk, or copied to the same location on each node)? Signal 6 is a
> general abort.
>
> You can find the version of the mpif.h file by looking at it in a file
> editor, if it is the LAM mpif.h it will be clearly marked as such.
>
> Hugh
>
> On Wed, 30 Mar 2005, Srinivasa Prade Patri wrote:
>
>> Hi!
>> I reinstalled the BLACS library with previous macro settings and it is still not working. But this time iam not getting the invalid commincator error but different ones. I ran the exe files on 4 nodes and here are the errors...
>>
>> ------------------------------------------------------------------------------------- [patri_at_e01 ~]$ mpirun -v -np 4 /home/BLACS/TESTING/EXE/xFbtest_MPI-LINUX-0
>> 1647 /home/BLACS/TESTING/EXE/xFbtest_MPI-LINUX-0 running on n0 (o)
>> 1225 /home/BLACS/TESTING/EXE/xFbtest_MPI-LINUX-0 running on n1
>> 1219 /home/BLACS/TESTING/EXE/xFbtest_MPI-LINUX-0 running on n2
>> 1219 /home/BLACS/TESTING/EXE/xFbtest_MPI-LINUX-0 running on n3
>> BLACS WARNING 'No need to set message ID range due to MPI communicator.'
>> from {-1,-1}, pnum=0, Contxt=-1, on line 18 of file 'blacs_set_.c'.
>>
>> BLACS WARNING 'No need to set message ID range due to MPI communicator.'
>> from {-1,-1}, pnum=1, Contxt=-1, on line 18 of file 'blacs_set_.c'.
>>
>> BLACS WARNING 'No need to set message ID range due to MPI communicator.'
>> from {-1,-1}, pnum=2, Contxt=-1, on line 18 of file 'blacs_set_.c'.
>>
>> BLACS WARNING 'No need to set message ID range due to MPI communicator.'
>> from {-1,-1}, pnum=3, Contxt=-1, on line 18 of file 'blacs_set_.c'.
>>
>> open: No such file or directory
>> apparent state: unit 11 named bt.dat
>> lately writing direct unformatted external IO
>> -----------------------------------------------------------------------------
>> One of the processes started by mpirun has exited with a nonzero exit
>> code. This typically indicates that the process finished in error.
>> If your process did not finish in error, be sure to include a "return
>> 0" or "exit(0)" in your C code before exiting the application.
>>
>> PID 1647 failed on node n0 (10.5.0.1) due to signal 6.
>> -----------------------------------------------------------------------------
>> MPI_Recv: process in local group is dead (rank 1, MPI_COMM_WORLD)
>> Rank (1, MPI_COMM_WORLD): Call stack within LAM:
>> Rank (1, MPI_COMM_WORLD): - MPI_Recv()
>> Rank (1, MPI_COMM_WORLD): - MPI_Allreduce()
>> Rank (1, MPI_COMM_WORLD): - MPI_Comm_dup()
>> Rank (1, MPI_COMM_WORLD): - main()
>> MPI_Recv: process in local group is dead (rank 2, MPI_COMM_WORLD)
>> Rank (2, MPI_COMM_WORLD): Call stack within LAM:
>> Rank (2, MPI_COMM_WORLD): - MPI_Recv()
>> Rank (2, MPI_COMM_WORLD): - MPI_Allreduce()
>> Rank (2, MPI_COMM_WORLD): - MPI_Comm_dup()
>> Rank (2, MPI_COMM_WORLD): - main()
>> MPI_Recv: process in local group is dead (rank 3, MPI_COMM_WORLD)
>> Rank (3, MPI_COMM_WORLD): Call stack within LAM:
>> Rank (3, MPI_COMM_WORLD): - MPI_Recv()
>> Rank (3, MPI_COMM_WORLD): - MPI_Allreduce()
>> Rank (3, MPI_COMM_WORLD): - MPI_Comm_dup()
>> Rank (3, MPI_COMM_WORLD): - main()
>> -------------------------------------------------------------------------------------
>>
>> What i observe is process in all nodes excep rank 0 are dead.
>> I have attached the Bmake.inc file with this mail.
>>
>> Thanking You
>>
>> Regards
>> Srinivasa Patri
>>
>>
>>
>> _______________________________________________
>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
>
>
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>