LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Brian Barrett (brbarret_at_[hidden])
Date: 2006-06-08 00:07:07


Another idea (in addition to the gdb question in my previous e-mail)...

Did you recompile your application between upgrades? Unfortunately,
LAM 6.3 and 7.1 are not interoperable and you will need to recompile
and relink your application for things to operate.

Brian

On May 25, 2006, at 8:40 AM, YoungHui Amend wrote:

> I've narrowed this problem to mpirun.c in otb/mpirun directory.
> In this file, there's get_mpi_world function. After it does nrecv
> (msg), it does the following check:
> if (msg.nh_type == 1) {
> char node[32];
> if (fl_very_verbose)
> printf("mpirun: someone died before MPI_INIT -- rank %d\n",
> msg.nh_node);
> snprintf(node, sizeof(node), "%d", msg.nh_node);
> show_help("mpirun", "no-init", node, NULL);
> errno = EMPINOINIT;
> return LAMERROR;
> }
> When is nh_type being set to 1 when issuing nsend command?
> One of the differences I noticed was that in 6.3, PTY_IS_DEFAULT is
> 0 but in 7.1, it's 1. What is the PTY support?
>
> I would appreciate any help you can give me.
> Thank you for your prompt attention,
> YoungHui Amend
>
> From: lam-bounces_at_[hidden] [mailto:lam-bounces_at_[hidden]] On
> Behalf Of YoungHui Amend
> Sent: Wednesday, May 24, 2006 9:51 AM
> To: lam_at_[hidden]
> Subject: LAM: LAM error: Unknown error 471
>
> Hi,
>
> I'm in process of upgrading from version 6.3 to 7.1.
>
> I've got lam daemons running on my master and slave machines. Then
> I'm executing mpirun with application schema and is getting
> MPI_Init: LAM error: Unknown error 471
> ----------------------------------------------------------------------
> -------
> It seems that [at least] one of the processes that was started with
> mpirun did not invoke MPI_INIT before quitting (it is possible that
> more than one process did not invoke MPI_INIT -- mpirun was only
> notified of the first one, which was on node n0).
>
> mpirun can *only* be used with MPI programs (i.e., programs that
> invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program
> to run non-MPI programs over the lambooted nodes.
> ----------------------------------------------------------------------
> -------
> //////////////////////////////////////////////////////////////////////
> /////////////////
> My master and slave processes does call MPI_Init. I think the
> error message for 471 is coming out of slave processes and
> therefore is quitting before my master processes gets a chance to
> call MPI_Init, which generates the message about not invoking
> MPI_INIT before quitting.
>
> This part of the code works fine with version 6.3. Are there some
> changes between release that I'm not aware of?
> I've seem some conflicting documentation saying MPI_Init needs to
> be called by all processes and then another help file saying the
> master or one of slave machine needs to call MPI_Init. In either
> case, what is the Unknow error 471 and which LAM/MPI source code is
> this coming out of?
>
> Here is my command:
> mpirun -t -c2c -O -w -x $LAM_EXPORT myapp
> where LAM_EXPORT=PATH,LD_LIBRARY_PATH,DISPLAY,LAMHOME
> myapp file contains:
> n0 /afs/tda/sti/r33/prod/linux24_64/tools/tb/bin-64/
> TWTgen parallelprocess=yes experiment=ya lbist=yes
> n1 /afs/tda/sti/r33/prod/linux24_64/tools/tb/bin-64/
> TWTgenfm experiment=ya lbist=yes parallelprocess=yes
> I'm running TWTgen on master and TWTgenfm on slave. They
> are the same programs with different calling entry.
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/

-- 
   Brian Barrett
   LAM/MPI developer and all around nice guy
   Have a LAM/MPI day: http://www.lam-mpi.org/