LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Guanhua Yan (ghyan_at_[hidden])
Date: 2005-03-29 01:00:19


Josh,

Thank you for your suggestions. I double-checked the first three and could not
find any problem. I will talk about the fourth point with my colleagues
tomorrow. In addition, I tried to run the program in different ways:

1) mpirun -np 1 ../../../run
2) mpirun -np 1 ../../../../../p2/p1/run

in which ../../../run and ../../../../../p2/p1/run point to the same
executable program. But 1) will generate the error but 2) doesn't. Any idea
on this?

thanks,
Guanhua

On Monday 28 March 2005 22:41, Josh Hursey wrote:
> Guanhua,
> The error message could be the result of a few things. Here is a short
> list of items to check:
>
> 1. lamboot should not be causing this problem, but it is good to make
> sure that everything booted properly. You can see lamboot's progress
> with 'lamboot -v', then 'lamnodes' will let you see the nodes that have
> been booted.
>
> 2. Make sure that your environment is set properly, and pointing to the
> correct binaries of LAM/MPI. If you have a couple of installations (say
> an RPM'ed image from RH, and a self installation in $HOME/local) then
> you will want to put the installation of the version you want to use
> first in your path (e.g. export PATH=$HOME/local/bin/;$PATH).
>
> 3. Make sure you compiled your MPI program with the version of
> 'mpicc/mpic++/mpif77' corresponding to the 'mpirun' command that you
> used. ('which mpicc', 'which mpirun')
>
> 4. Check your code looking for places where the program may have exited
> before calling MPI_Init() (per the error message). It is suggested that
> you call MPI_Init as early as possible in your MPI program.
>
> Give some of those a try, and let me know if that helps.
>
> Josh
>
> On Mar 28, 2005, at 6:57 PM, Guanhua Yan wrote:
> > Hi all,
> >
> > I met some strange problems when using LAM. Hope some experienced
> > people can
> > give me a hand.
> >
> > A month ago, I used LAM on my laptop successfully. Later, I was
> > interrupted by
> > something else. And yesterday, when I resumed to work on my
> > parallelization
> > code, something fishy happened:
> >
> > The code that is executable before does not work now. When I tried to
> > use
> > "mpirun -np 1 <executable>", I always had the following printouts:
> >
> > -----------------------------------------------------------------------
> > ------
> > It seems that [at least] one of the processes that was started with
> > mpirun did not invoke MPI_INIT before quitting (it is possible that
> > more than one process did not invoke MPI_INIT -- mpirun was only
> > notified of the first one, which was on node n0).
> >
> > mpirun can *only* be used with MPI programs (i.e., programs that
> > invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program
> > to run non-MPI programs over the lambooted nodes.
> > -----------------------------------------------------------------------
> > ------
> >
> > I am pretty sure that "lamboot" is successfully done. And the LAM
> > version is
> > 7.0.6. And the gcc version is "3.2.2". I did observe "gcc: invalid
> > version
> > number format" in the configuration log. but I am not sure whether
> > this is
> > the real reason.
> >
> > thanks,
> > Guanhua
> > _______________________________________________
> > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
> ----
> Josh Hursey
> jjhursey_at_[hidden]
> http://www.lam-mpi.org/