LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Adams, Samuel D Contr AFRL/HEDR (Samuel.Adams_at_[hidden])
Date: 2007-05-11 14:26:31


Sorry, you were right. I thought that I commented out all of the mpi
communication except what I had posted bellow, but it turned out that I
had another little function hiding out that was sending a couple of
floats and it had a negative tag. I forgot about that one. For some
reason, I guess I was thinking that the tag only had to be an int and
not necessarily an unsigned int.

Sam Adams
General Dynamics - Network Systems
Phone: 210.536.5945

-----Original Message-----
From: lam-bounces_at_[hidden] [mailto:lam-bounces_at_[hidden]] On Behalf
Of Jeff Squyres
Sent: Thursday, May 10, 2007 8:53 PM
To: General LAM/MPI mailing list
Subject: Re: LAM: MPI error mesage

Well, that's pretty kooky. :-(

Here's the code from MPI_SEND that's generating the error:

        if (tag < 0 || tag > lam_mpi_max_tag) {
                return(lam_err_comm(comm, MPI_ERR_TAG, EINVAL,
                                    "out of range"));
        }

But according to your code, that can't be happening because your tags
are fixed positive integers (lam_mpi_max_tag is at least 32k).

Are you absolutely certain that this is where the problem is occurring?

You might want to either run this through a debugger to verify that
a) this is where the problem is occurring, and b) what LAM thinks its
getting as a tag value. Or you could write some quick MPI_Send /
MPI_Recv intercept functions that utilize the PMPI layer, perhaps
something like this:

int MPI_Send(void *buf, int count, MPI_Datatype dtype, int dest,
             int tag, MPI_Comm comm)
{
        if (tag < 0 || tag > 32767) {
             char host[4096];
             int i = 0;
             gethostbyname(host, sizeof(host));
             printf("%s:%d: got invalid tag in MPI_Send! %d\n",
                    host, getpid(), tag);
             while (i == 0) sleep(5);
         }
         return PMPI_Send(buf, count, dtype, dest, tag, comm);
}

(disclaimer: typed in e-mail; not verified!)

This will print out the host/pid of the offending process(es) and
pause allowing you to attach a debugger. Modify the inner part of
the block to suit your particular debugging tastes.

On May 9, 2007, at 12:12 PM, Adams, Samuel D Contr AFRL/HEDR wrote:

> I am getting this error when I run my code with LAM. I was using this
> code with another system that was running with a slightly older MPICH
> and didn't get any errors like this. I would seem there is something
> with the way I am sending and receiving slices. Can you see anything
> obviously wrong with the way I am doing this?
>
> * Starting updates
> * cycle 1
> MPI_Recv: invalid tag argument: Invalid argument (rank 0,
> MPI_COMM_WORLD)
> MPI_Send: invalid tag argument: Invalid argument: out of range
> (rank 1,
> MPI_COMM_WORLD)
> Rank (0, MPI_COMM_WORLD): Call stack within LAM:
> Rank (1, MPI_COMM_WORLD): Call stack within LAM:
> Rank (1, MPI_COMM_WORLD): - MPI_Send()
> Rank (1, MPI_COMM_WORLD): - main()
> Rank (0, MPI_COMM_WORLD): - MPI_Recv()
> Rank (0, MPI_COMM_WORLD): - main()
> ----------------------------------------------------------------------

> --
> -----
> One of the processes started by mpirun has exited with a nonzero exit
> code. This typically indicates that the process finished in error.
> If your process did not finish in error, be sure to include a "return
> 0" or "exit(0)" in your C code before exiting the application.
>
> PID 22373 failed on node n0 (127.0.0.1) with exit status 22.
> ----------------------------------------------------------------------

> --
> -----
> mpirun failed with exit status 22
>
> ===========================code=======================================

> ==
> =
> void hSndRcv(){
> if(my_rank != comm_size-1){
> MPI_Send(h_x+Z_OFFSET(my_dim_z),
> (dim_x + 2*pml)*(dim_y + 2*pml), \
> MPI_FLOAT, \
> my_rank+1, \
> 3, \
> MPI_COMM_WORLD);
> MPI_Send(h_y+Z_OFFSET(my_dim_z),
> (dim_x + 2*pml)*(dim_y + 2*pml), \
> MPI_FLOAT, \
> my_rank+1, \
> 4, \
> MPI_COMM_WORLD);
> MPI_Send(h_z+Z_OFFSET(my_dim_z),
> (dim_x + 2*pml)*(dim_y + 2*pml), \
> MPI_FLOAT, \
> my_rank+1, \
> 5, \
> MPI_COMM_WORLD);
> }
> if(my_rank){
> MPI_Recv(h_x,
> (dim_x + 2*pml)*(dim_y + 2*pml), \
> MPI_FLOAT, \
> my_rank-1, \
> 3, \
> MPI_COMM_WORLD, \
> status);
> MPI_Recv(h_y,
> (dim_x + 2*pml)*(dim_y + 2*pml), \
> MPI_FLOAT, \
> my_rank-1, \
> 4, \
> MPI_COMM_WORLD, \
> status);
> MPI_Recv(h_z,
> (dim_x + 2*pml)*(dim_y + 2*pml), \
> MPI_FLOAT, \
> my_rank-1, \
> 5, \
> MPI_COMM_WORLD, \
> status);
> }
> }
>
> Sam Adams
> General Dynamics - Network Systems
> Phone: 210.536.5945
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/

-- 
Jeff Squyres
Cisco Systems
_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/