On Mar 29, 2005, at 1:00 AM, Guanhua Yan wrote:
> Josh,
>
> Thank you for your suggestions. I double-checked the first three and
> could not
> find any problem. I will talk about the fourth point with my colleagues
> tomorrow.
The fourth point is really the crux. This error message is usually
caused by a process exiting before MPI_Init, but is also caused by
running mpirun on a non-MPI program [a program that never calls
MPI_Init].
> In addition, I tried to run the program in different ways:
>
> 1) mpirun -np 1 ../../../run
> 2) mpirun -np 1 ../../../../../p2/p1/run
>
> in which ../../../run and ../../../../../p2/p1/run point to the same
> executable program. But 1) will generate the error but 2) doesn't. Any
> idea
> on this?
This is peculiar. Did you try both scenarios multiple times? Try
passing the full path to mpirun. You may also want to run a 'hello
world' style of MPI application
[http://www.lam-mpi.org/tutorials/nd/part1/lab1.c] just to confirm for
yourself that everything is running properly.
Josh
>
> thanks,
> Guanhua
>
> On Monday 28 March 2005 22:41, Josh Hursey wrote:
>> Guanhua,
>> The error message could be the result of a few things. Here is a short
>> list of items to check:
>>
>> 1. lamboot should not be causing this problem, but it is good to make
>> sure that everything booted properly. You can see lamboot's progress
>> with 'lamboot -v', then 'lamnodes' will let you see the nodes that
>> have
>> been booted.
>>
>> 2. Make sure that your environment is set properly, and pointing to
>> the
>> correct binaries of LAM/MPI. If you have a couple of installations
>> (say
>> an RPM'ed image from RH, and a self installation in $HOME/local) then
>> you will want to put the installation of the version you want to use
>> first in your path (e.g. export PATH=$HOME/local/bin/;$PATH).
>>
>> 3. Make sure you compiled your MPI program with the version of
>> 'mpicc/mpic++/mpif77' corresponding to the 'mpirun' command that you
>> used. ('which mpicc', 'which mpirun')
>>
>> 4. Check your code looking for places where the program may have
>> exited
>> before calling MPI_Init() (per the error message). It is suggested
>> that
>> you call MPI_Init as early as possible in your MPI program.
>>
>> Give some of those a try, and let me know if that helps.
>>
>> Josh
>>
>> On Mar 28, 2005, at 6:57 PM, Guanhua Yan wrote:
>>> Hi all,
>>>
>>> I met some strange problems when using LAM. Hope some experienced
>>> people can
>>> give me a hand.
>>>
>>> A month ago, I used LAM on my laptop successfully. Later, I was
>>> interrupted by
>>> something else. And yesterday, when I resumed to work on my
>>> parallelization
>>> code, something fishy happened:
>>>
>>> The code that is executable before does not work now. When I tried to
>>> use
>>> "mpirun -np 1 <executable>", I always had the following printouts:
>>>
>>> ---------------------------------------------------------------------
>>> --
>>> ------
>>> It seems that [at least] one of the processes that was started with
>>> mpirun did not invoke MPI_INIT before quitting (it is possible that
>>> more than one process did not invoke MPI_INIT -- mpirun was only
>>> notified of the first one, which was on node n0).
>>>
>>> mpirun can *only* be used with MPI programs (i.e., programs that
>>> invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program
>>> to run non-MPI programs over the lambooted nodes.
>>> ---------------------------------------------------------------------
>>> --
>>> ------
>>>
>>> I am pretty sure that "lamboot" is successfully done. And the LAM
>>> version is
>>> 7.0.6. And the gcc version is "3.2.2". I did observe "gcc: invalid
>>> version
>>> number format" in the configuration log. but I am not sure whether
>>> this is
>>> the real reason.
>>>
>>> thanks,
>>> Guanhua
>>> _______________________________________________
>>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
>> ----
>> Josh Hursey
>> jjhursey_at_[hidden]
>> http://www.lam-mpi.org/
>>
----
Josh Hursey
jjhursey_at_[hidden]
http://www.lam-mpi.org/
|