LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2004-08-06 08:35:45


There's really no way to say for sure without debugging the actual
program.

One common misconception about parallel MPI applications is that
they're "too complicated to debug." Remember: at heart, an MPI
application is just a collection of serial processes running. Each of
these serial processes follow the same rules of
C/Fortran/your-favorite-language-here running in a non-parallel
program. Some times bugs are quite easy to find because the code does
something silly like try to dereference through a NULL pointer, etc. --
the issue may have nothing to do with MPI at all. MPI is just a
library that provides some extra functionality (message passing and
whatnot).

Although I certainly can't say this for sure without looking at the
code and seeing what is happening when you run it, but if your program
is crashing with a bunch of different MPI implementations, then the
problem may have something to do with the logic of the program itself
(and not how MPI is used). You might want to try memory-checking
debuggers such as valgrind, etc.

Good luck.

On Aug 6, 2004, at 10:06 AM, Gkikas Magiorkinis wrote:

> I found out that the package i compiled had an mpi.h file of its own,
> called
> in the source code through a line: include "mpi.h". I changed that
> line to :
> include <mpi.h> and bypassed that problem (in order to include the
> LAM's
> mpi.h.
>
> Unfortunately it performed another problem. I stops during the
> execution of
> a specific procedure. I have tried and compiled the program using:
> LAM 6.5.9
> LAM 7.0.6
> MPICH 1.2.5.2
> And it persistently crashes while trying to execute this specific
> procedure.
> I have tried to contact the authors of the program and they never
> replied.
> Still I do not know what to do. I am pretty sure it is a minor bug
> because i
> have compiled that program in the past in a HP-UX machine and i know
> that it
> worked fine. I think that HP has its own implementation of mpi. Do you
> know
> any standard cross platform inconsistencies? Anything i could try?
>
> Once more thank you very much,
>
> Gkikas
>
> -----Original Message-----
> From: lam-bounces_at_[hidden] [mailto:lam-bounces_at_[hidden]] On
> Behalf Of
> Jeff Squyres
> Sent: Friday, August 06, 2004 2:02 PM
> To: General LAM/MPI mailing list
> Subject: Re: LAM: Invalid communicator
>
> Just as the error message says, it means that you're calling
> MPI_COMM_SIZE with an invalid communicator. ;-)
>
> This can happen if you MPI_COMM_FREE a communicator and then call
> MPI_COMM_SIZE with that communicator, or you pass in NULL to
> MPI_COMM_SIZE, or something similar to that.
>
>
> On Aug 6, 2004, at 4:23 AM, Gkikas Magiorkinis wrote:
>
>> Hi again,
>>
>> I have finally installed LAM and compiled a program which outputs:
>>
>> MPI_Comm_size: invalid communicator: Invalid argument (rank 0,
>> MPI_COMM_WORLD)
>> MPI_Comm_size: invalid communicator: Invalid argument (rank 1,
>> MPI_COMM_WORLD)
>> Rank (1, MPI_COMM_WORLD): Call stack within LAM:
>> Rank (1, MPI_COMM_WORLD): - MPI_Comm_size()
>> Rank (1, MPI_COMM_WORLD): - main ()
>> MPI_Comm_size: invalid communicator: Invalid argument (rank 0,
>> MPI_COMM_WORLD)
>>
>> Does anyone have any idea why this could be hapenning?
>
> --
> {+} Jeff Squyres
> {+} jsquyres_at_[hidden]
> {+} http://www.lam-mpi.org/
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/