This is exactly the bug that is remaining in the 7.1.x tree with
regards to IMPI support -- it is what is preventing us from releasing
7.1.2. There is some deep voodoo with respect to IMPI during
MPI_INIT that is not occurring properly when the collective
algorithms are setup.
The SVN trunk/ and branch-7-1/ branch are effectively the same
(hence, 7.2b1... and the latest 7.1.2 beta are pretty much the same)
-- they both have this startup bug.
If you need to use IMPI, you should probably roll back to 7.0.6 for
the time being. Sorry!
On Dec 6, 2005, at 11:17 PM, Angel Villalain wrote:
> Hi all
>
> My problem is the following I am trying to run a simple mpi code using
> IMPI support. I installed lam 7.2b1r10223 because I found on your
> mailing list that there were mistakes related with impi on lam 7.1.1
> version.
> Once installed I downloaded the impi server v1.3 and all tests were
> fine. But the problem comes when I try to run a simple hello world
> code.
> First I run the server
> ./impi_server -v -server 1 -p 8000
> Output
> ****************************
> Authentication not specified on command line.
> Checking environment variables.
> IMPI_AUTH_NONE enabled.
> server_auths[0] = 0
> IMPI server version 0 started on host komolongma.ece.uprm.edu
> IMPI server listening on port 8000 for 1 connection(s).
> xxx.xxx.xxx.x:8000
> IMPI server: Entering main server loop.
>
> Then I the client
> impirun -client 0 xxx.xxx.xxx.x:8000 -np 2 hello
> But then the client do not prompt any output and looks like it is not
> doing anything. The server did not prompt any acknowledge at all of
> what is happening. So I kill the client and the output is the
> following
> Output
> ****************************
> MPI_Recv: process in local group is dead (rank 2, comm 12)
> Rank (0, MPI_COMM_WORLD): Call stack within LAM:
> Rank (0, MPI_COMM_WORLD): - MPI_Recv()
> Rank (0, MPI_COMM_WORLD): - MPI_Allreduce()
> Rank (0, MPI_COMM_WORLD): - MPI_Comm_split()
> Rank (0, MPI_COMM_WORLD): - MPI_Comm_split()
> Rank (0, MPI_COMM_WORLD): - MPI_Intercomm_merge()
> Rank (0, MPI_COMM_WORLD): - MPI_Init()
> Rank (0, MPI_COMM_WORLD): - main()
> ----------------------------------------------------------------------
> -------
> One of the processes started by mpirun has exited with a nonzero exit
> code. This typically indicates that the process finished in error.
> If your process did not finish in error, be sure to include a "return
> 0" or "exit(0)" in your C code before exiting the application.
>
> PID 4527 failed on node n0 (10.1.1.1) due to signal 15.
> ----------------------------------------------------------------------
> -------
>
> And the code is the following
> #include <stdio.h>
> #include "mpi.h"
>
> int main (int argc, char *argv[])
> {
> int id, np;
> char name[MPI_MAX_PROCESSOR_NAME];
> int namelen;
> int i;
>
> MPI_Init (&argc, &argv);
>
> MPI_Comm_size (MPI_COMM_WORLD, &np);
> MPI_Comm_rank (MPI_COMM_WORLD, &id);
> MPI_Get_processor_name (name, &namelen);
>
> printf ("This is Process %2d out of %2d running on host", id, np);
>
> MPI_Finalize ();
>
> return (0);
> }
>
> Is there anyone who can help with this?
> Thanks in advance.
> --
> La muerte sera una liberacion frente a esta parodia
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/
|