Thanks. I avoided the problem by using another version of the code. But the
direct relative path still does not work(../../../run). I need to specify the
parent directory and goes down to the one where the executable is located.
Although I am still curious about why this could happen, it does not prevent
me from continuing my work.
- Guanhua
On Tuesday 29 March 2005 06:53, Josh Hursey wrote:
> On Mar 29, 2005, at 1:00 AM, Guanhua Yan wrote:
> > Josh,
> >
> > Thank you for your suggestions. I double-checked the first three and
> > could not
> > find any problem. I will talk about the fourth point with my colleagues
> > tomorrow.
>
> The fourth point is really the crux. This error message is usually
> caused by a process exiting before MPI_Init, but is also caused by
> running mpirun on a non-MPI program [a program that never calls
> MPI_Init].
>
> > In addition, I tried to run the program in different ways:
> >
> > 1) mpirun -np 1 ../../../run
> > 2) mpirun -np 1 ../../../../../p2/p1/run
> >
> > in which ../../../run and ../../../../../p2/p1/run point to the same
> > executable program. But 1) will generate the error but 2) doesn't. Any
> > idea
> > on this?
>
> This is peculiar. Did you try both scenarios multiple times? Try
> passing the full path to mpirun. You may also want to run a 'hello
> world' style of MPI application
> [http://www.lam-mpi.org/tutorials/nd/part1/lab1.c] just to confirm for
> yourself that everything is running properly.
>
> Josh
>
> > thanks,
> > Guanhua
> >
> > On Monday 28 March 2005 22:41, Josh Hursey wrote:
> >> Guanhua,
> >> The error message could be the result of a few things. Here is a short
> >> list of items to check:
> >>
> >> 1. lamboot should not be causing this problem, but it is good to make
> >> sure that everything booted properly. You can see lamboot's progress
> >> with 'lamboot -v', then 'lamnodes' will let you see the nodes that
> >> have
> >> been booted.
> >>
> >> 2. Make sure that your environment is set properly, and pointing to
> >> the
> >> correct binaries of LAM/MPI. If you have a couple of installations
> >> (say
> >> an RPM'ed image from RH, and a self installation in $HOME/local) then
> >> you will want to put the installation of the version you want to use
> >> first in your path (e.g. export PATH=$HOME/local/bin/;$PATH).
> >>
> >> 3. Make sure you compiled your MPI program with the version of
> >> 'mpicc/mpic++/mpif77' corresponding to the 'mpirun' command that you
> >> used. ('which mpicc', 'which mpirun')
> >>
> >> 4. Check your code looking for places where the program may have
> >> exited
> >> before calling MPI_Init() (per the error message). It is suggested
> >> that
> >> you call MPI_Init as early as possible in your MPI program.
> >>
> >> Give some of those a try, and let me know if that helps.
> >>
> >> Josh
> >>
> >> On Mar 28, 2005, at 6:57 PM, Guanhua Yan wrote:
> >>> Hi all,
> >>>
> >>> I met some strange problems when using LAM. Hope some experienced
> >>> people can
> >>> give me a hand.
> >>>
> >>> A month ago, I used LAM on my laptop successfully. Later, I was
> >>> interrupted by
> >>> something else. And yesterday, when I resumed to work on my
> >>> parallelization
> >>> code, something fishy happened:
> >>>
> >>> The code that is executable before does not work now. When I tried to
> >>> use
> >>> "mpirun -np 1 <executable>", I always had the following printouts:
> >>>
> >>> ---------------------------------------------------------------------
> >>> --
> >>> ------
> >>> It seems that [at least] one of the processes that was started with
> >>> mpirun did not invoke MPI_INIT before quitting (it is possible that
> >>> more than one process did not invoke MPI_INIT -- mpirun was only
> >>> notified of the first one, which was on node n0).
> >>>
> >>> mpirun can *only* be used with MPI programs (i.e., programs that
> >>> invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program
> >>> to run non-MPI programs over the lambooted nodes.
> >>> ---------------------------------------------------------------------
> >>> --
> >>> ------
> >>>
> >>> I am pretty sure that "lamboot" is successfully done. And the LAM
> >>> version is
> >>> 7.0.6. And the gcc version is "3.2.2". I did observe "gcc: invalid
> >>> version
> >>> number format" in the configuration log. but I am not sure whether
> >>> this is
> >>> the real reason.
> >>>
> >>> thanks,
> >>> Guanhua
> >>> _______________________________________________
> >>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> >>
> >> ----
> >> Josh Hursey
> >> jjhursey_at_[hidden]
> >> http://www.lam-mpi.org/
>
> ----
> Josh Hursey
> jjhursey_at_[hidden]
> http://www.lam-mpi.org/
|