I realize that I am replying to a *really* old post here, but this same
exact problem just came up in a different context.
When stepping through the code with a debugger, it looks like the C++ class
is somehow not passing the underlying MPI_Comm correctly -- it passes in a
totally bogus value to the underlying C function. Sometimes this bogus
value is NULL, sometimes it is non-NULL. The NULL values will result in
what you are seeing (because MPI_Comm_rank() checks for NULL communicator
values); the non-NULL values will result in a segv (because the
MPI_Comm_rank() will try to use that as a valid communicator).
I notice that this does not happen with any other version of the Intel
compiler (including 9.0). As such, right now, I'm treating this as a bug in
the Intel 9.1 compiler (it may still be an OMPI problem, but as of now, I
don't see where...). I filed a bug with Intel premier support about this
about a week ago; we'll see what they say.
On 6/28/06 4:10 PM, "William Pughe" <wlp_at_[hidden]> wrote:
> Hello all. I'm having problems running a simple program. I built lam
> 7.1.1 with the intel 9.1 compiler. When running the following program:
> #include <iostream>
> #include <mpi.h>
>
> using namespace std;
>
> int main()
> {
> // Do initial MPI setup stuff...
> MPI::Init();
> int rank = MPI::COMM_WORLD.Get_rank();
> int ncount = MPI::COMM_WORLD.Get_size();
> cerr << "rank " << rank << endl;
> cerr << "ncount " << ncount << endl;
> }
>
> I get the following error:
>
> mpirun -np 1 ./prog
> MPI_Comm_rank: invalid communicator: Invalid argument (rank 0,
> MPI_COMM_WORLD)
> Rank (0, MPI_COMM_WORLD): Call stack within LAM:
> Rank (0, MPI_COMM_WORLD): - MPI_Comm_rank()
> Rank (0, MPI_COMM_WORLD): - main()
> -----------------------------------------------------------------------------
> One of the processes started by mpirun has exited with a nonzero exit
> code. This typically indicates that the process finished in error.
> If your process did not finish in error, be sure to include a "return
> 0" or "exit(0)" in your C code before exiting the application.
>
> PID 26802 failed on node n0 (127.0.0.1) with exit status 22.
> -----------------------------------------------------------------------------
>
> If I use the 9.1 compiler and the version of lam I built with the 9.0
> compiler everything work fine.
> icpc -I/ll/3rdptysw/x86_64-Linux-2.4-icc-9.0/include -pthread prog.C -o
> prog -L/ll/3rdptysw/x86_64-Linux-2.4-icc-9.0/lib -llammpio -llammpi++
> -llamf77mpi -lmpi -llam -laio -laio -lutil -ldl
>
> When I compile with lam built with 9.1 it breaks.
> icpc -I/ll/3rdptysw/x86_64-Linux-2.4-icc-9.1/include -pthread prog.C -o
> prog -L/ll/3rdptysw/x86_64-Linux-2.4-icc-9.1/lib -llammpio -llammpi++
> -llamf77mpi -lmpi -llam -laio -laio -lutil -ldl
>
>
> laminfo output:
> LAM/MPI: 7.1.1
> Prefix: /ll/3rdptysw/x86_64-Linux-2.4-icc-9.1
> Architecture: x86_64-unknown-linux-gnu
> Configured by: wlp
> Configured on: Tue Jun 27 17:29:45 EDT 2006
> Configure host: adam
> Memory manager: ptmalloc2
> C bindings: yes
> C++ bindings: yes
> Fortran bindings: yes
> C compiler: icc
> C++ compiler: icpc
> Fortran compiler: g77
> Fortran symbols: double_underscore
> C profiling: yes
> C++ profiling: yes
> Fortran profiling: yes
> C++ exceptions: no
> Thread support: yes
> ROMIO support: yes
> IMPI support: no
> Debug support: no
> Purify clean: no
> SSI boot: globus (API v1.1, Module v0.6)
> SSI boot: rsh (API v1.1, Module v1.1)
> SSI boot: slurm (API v1.1, Module v1.0)
> SSI coll: lam_basic (API v1.1, Module v7.1)
> SSI coll: shmem (API v1.1, Module v1.0)
> SSI coll: smp (API v1.1, Module v1.2)
> SSI rpi: crtcp (API v1.1, Module v1.1)
> SSI rpi: lamd (API v1.0, Module v7.1)
> SSI rpi: sysv (API v1.0, Module v7.1)
> SSI rpi: tcp (API v1.0, Module v7.1)
> SSI rpi: usysv (API v1.0, Module v7.1)
> SSI cr: self (API v1.0, Module v1.0)
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
--
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems
|