LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: YoungHui Amend (yamend_at_[hidden])
Date: 2006-05-25 10:40:35


I've narrowed this problem to mpirun.c in otb/mpirun directory.
In this file, there's get_mpi_world function. After it does nrecv(msg),
it does the following check:
    if (msg.nh_type == 1) {
      char node[32];
      if (fl_very_verbose)
        printf("mpirun: someone died before MPI_INIT -- rank %d\n",
               msg.nh_node);
      snprintf(node, sizeof(node), "%d", msg.nh_node);
      show_help("mpirun", "no-init", node, NULL);
      errno = EMPINOINIT;
      return LAMERROR;
    }

When is nh_type being set to 1 when issuing nsend command?

One of the differences I noticed was that in 6.3, PTY_IS_DEFAULT is 0
but in 7.1, it's 1. What is the PTY support?
 
I would appreciate any help you can give me.
Thank you for your prompt attention,
YoungHui Amend

________________________________

        From: lam-bounces_at_[hidden] [mailto:lam-bounces_at_[hidden]]
On Behalf Of YoungHui Amend
        Sent: Wednesday, May 24, 2006 9:51 AM
        To: lam_at_[hidden]
        Subject: LAM: LAM error: Unknown error 471
        
        
        Hi,
         
        I'm in process of upgrading from version 6.3 to 7.1.
         
        I've got lam daemons running on my master and slave machines.
Then I'm executing mpirun with application schema and is getting
        MPI_Init: LAM error: Unknown error 471
        
------------------------------------------------------------------------
-----
        It seems that [at least] one of the processes that was started
with
        mpirun did not invoke MPI_INIT before quitting (it is possible
that
        more than one process did not invoke MPI_INIT -- mpirun was only
        notified of the first one, which was on node n0).
         
        mpirun can *only* be used with MPI programs (i.e., programs that
        invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec"
program
        to run non-MPI programs over the lambooted nodes.
        
------------------------------------------------------------------------
-----
        
////////////////////////////////////////////////////////////////////////
///////////////
        My master and slave processes does call MPI_Init. I think the
error message for 471 is coming out of slave processes and therefore is
quitting before my master processes gets a chance to call MPI_Init,
which generates the message about not invoking MPI_INIT before quitting.
         
        This part of the code works fine with version 6.3. Are there
some changes between release that I'm not aware of?
        I've seem some conflicting documentation saying MPI_Init needs
to be called by all processes and then another help file saying the
master or one of slave machine needs to call MPI_Init. In either case,
what is the Unknow error 471 and which LAM/MPI source code is this
coming out of?
         
        Here is my command:
        mpirun -t -c2c -O -w -x $LAM_EXPORT myapp
        where LAM_EXPORT=PATH,LD_LIBRARY_PATH,DISPLAY,LAMHOME
                  myapp file contains:
                  n0
/afs/tda/sti/r33/prod/linux24_64/tools/tb/bin-64/TWTgen
parallelprocess=yes experiment=ya lbist=yes
                  n1
/afs/tda/sti/r33/prod/linux24_64/tools/tb/bin-64/TWTgenfm experiment=ya
lbist=yes parallelprocess=yes
        
                  I'm running TWTgen on master and TWTgenfm on slave.
They are the same programs with different calling entry.