LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Bhanu Prakash (sneham02i_at_[hidden])
Date: 2009-02-26 14:26:39


its ok.. n all the executables are in the same location on all nodes

On Thu, Feb 26, 2009 at 11:18 PM, Abhirup Chakraborty
<abhirupc91_at_[hidden]>wrote:

> Sorry, I didn't notice the "which" commands.
> Did you place the executable files under same directories (i.e., the file
> paths are same) in both the machines?
> It seems the process in machine 1 is not running. If lamboot is run
> without error, this usually does not happen if there is not problem with
> file location.
>
>
>
> On Thu, Feb 26, 2009 at 11:41 AM, Bhanu Prakash <sneham02i_at_[hidden]>wrote:
>
>> I added it to the path of each node hence when i executed the command
>> from the node it shows the path /paracel/lam/[webstructure_at_compute-0-0~]$ which mpirun
>> /paracel/lam/bin/mpirun
>>
>> Later i found that the path /usr/bin on all compute nodes had a soft link
>> of mpirun,mpiCC,mpicc etc.. i removed all the soft links and added an
>> environment file in .ssh/. After doing this the outputs where as follows:
>>
>> [webstructure_at_hpc-icrisat bin]$ ssh compute-0-0.local which mpirun
>> /paracel/lam/bin/mpirun
>> [webstructure_at_hpc-icrisat bin]$ ssh compute-0-2.local which mpirun
>> /paracel/lam/bin/mpirun
>> [webstructure_at_hpc-icrisat bin]$ ssh compute-0-3.local which mpirun
>> /paracel/lam/bin/mpirun
>> [webstructure_at_hpc-icrisat bin]$ ssh compute-0-4.local which mpirun
>> /paracel/lam/bin/mpirun
>>
>> but still i get the same error message when i run my application
>> the error is
>>
>> 0 - MPI_SEND : Invalid rank 1
>> [0] Aborting program !
>> [0] Aborting program!
>>
>> -----------------------------------------------------------------------------
>> It seems that [at least] one of the processes that was started with
>> mpirun did not invoke MPI_INIT before quitting (it is possible that
>> more than one process did not invoke MPI_INIT -- mpirun was only
>> notified of the first one, which was on node n0).
>>
>> mpirun can *only* be used with MPI programs (i.e., programs that
>> invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program
>> to run non-MPI programs over the lambooted nodes.
>>
>> -----------------------------------------------------------------------------
>>
>>
>>
>>
>>
>> On Thu, Feb 26, 2009 at 8:22 PM, Abhirup Chakraborty <
>> abhirupc91_at_[hidden]> wrote:
>>
>>> Try adding the directory (/paracel/lam/bin/) to PATH environment
>>> variable.
>>> Hope it will work.
>>>
>>>
>>> On Thu, Feb 26, 2009 at 4:56 AM, Bhanu Prakash <sneham02i_at_[hidden]>wrote:
>>>
>>>> Hi..
>>>>
>>>> i'm working with lam 7.1.2
>>>>
>>>> lam is installed @ /paracel/lam
>>>>
>>>> this is the error message i receive when i run my application.
>>>>
>>>> 0 - MPI_SEND : Invalid rank 1
>>>> [0] Aborting program !
>>>> [0] Aborting program!
>>>>
>>>> -----------------------------------------------------------------------------
>>>> It seems that [at least] one of the processes that was started with
>>>> mpirun did not invoke MPI_INIT before quitting (it is possible that
>>>> more than one process did not invoke MPI_INIT -- mpirun was only
>>>> notified of the first one, which was on node n0).
>>>>
>>>> mpirun can *only* be used with MPI programs (i.e., programs that
>>>> invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program
>>>> to run non-MPI programs over the lambooted nodes.
>>>>
>>>> -----------------------------------------------------------------------------
>>>>
>>>> when i look for which mpirun i get the folllowing messages:
>>>>
>>>> on headnode
>>>> [webstructure_at_hpc~]$ which mpirun
>>>> /paracel/lam/bin/mpirun
>>>>
>>>> from headnode:
>>>> [webstructure_at_hpc~]$ ssh compute-0-0.local which mpirun
>>>> /usr/bin/mpirun
>>>>
>>>> on compute node:
>>>> [webstructure_at_hpc~]$ ssh compute-0-0.local
>>>> Last login: Thu Feb 26 14:31:19 2009 from hpc
>>>> Rocks Compute Node
>>>> Rocks 5.0 (V)
>>>> Profile built 17:45 03-Sep-2008
>>>>
>>>> Kickstarted 17:58 03-Sep-2008
>>>> [webstructure_at_compute-0-0 ~]$ which mpirun
>>>> /paracel/lam/bin/mpirun
>>>>
>>>>
>>>> i feel this is the reason for the mpirun not working
>>>>
>>>> plz do suggest me how can i solve this issue.
>>>>
>>>> Thanks in advance.
>>>>
>>>> --
>>>> Don't go where the path may lead, go
>>>> instead where there is no path and leave
>>>> a trail.
>>>>
>>>> _______________________________________________
>>>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>>>
>>>
>>>
>>> _______________________________________________
>>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>>
>>
>>
>>
>> --
>> Don't go where the path may lead, go
>> instead where there is no path and leave
>> a trail.
>>
>> _______________________________________________
>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>

-- 
Don't go where the path may lead, go
instead where there is no path and leave
a trail.