Sorry, you were right. I thought that I commented out all of the mpi
communication except what I had posted bellow, but it turned out that I
had another little function hiding out that was sending a couple of
floats and it had a negative tag. I forgot about that one. For some
reason, I guess I was thinking that the tag only had to be an int and
not necessarily an unsigned int.
Sam Adams
General Dynamics - Network Systems
Phone: 210.536.5945
-----Original Message-----
From: lam-bounces_at_[hidden] [mailto:lam-bounces_at_[hidden]] On Behalf
Of Jeff Squyres
Sent: Thursday, May 10, 2007 8:53 PM
To: General LAM/MPI mailing list
Subject: Re: LAM: MPI error mesage
Well, that's pretty kooky. :-(
Here's the code from MPI_SEND that's generating the error:
if (tag < 0 || tag > lam_mpi_max_tag) {
return(lam_err_comm(comm, MPI_ERR_TAG, EINVAL,
"out of range"));
}
But according to your code, that can't be happening because your tags
are fixed positive integers (lam_mpi_max_tag is at least 32k).
Are you absolutely certain that this is where the problem is occurring?
You might want to either run this through a debugger to verify that
a) this is where the problem is occurring, and b) what LAM thinks its
getting as a tag value. Or you could write some quick MPI_Send /
MPI_Recv intercept functions that utilize the PMPI layer, perhaps
something like this:
int MPI_Send(void *buf, int count, MPI_Datatype dtype, int dest,
int tag, MPI_Comm comm)
{
if (tag < 0 || tag > 32767) {
char host[4096];
int i = 0;
gethostbyname(host, sizeof(host));
printf("%s:%d: got invalid tag in MPI_Send! %d\n",
host, getpid(), tag);
while (i == 0) sleep(5);
}
return PMPI_Send(buf, count, dtype, dest, tag, comm);
}
(disclaimer: typed in e-mail; not verified!)
This will print out the host/pid of the offending process(es) and
pause allowing you to attach a debugger. Modify the inner part of
the block to suit your particular debugging tastes.
On May 9, 2007, at 12:12 PM, Adams, Samuel D Contr AFRL/HEDR wrote:
> I am getting this error when I run my code with LAM. I was using this
> code with another system that was running with a slightly older MPICH
> and didn't get any errors like this. I would seem there is something
> with the way I am sending and receiving slices. Can you see anything
> obviously wrong with the way I am doing this?
>
> * Starting updates
> * cycle 1
> MPI_Recv: invalid tag argument: Invalid argument (rank 0,
> MPI_COMM_WORLD)
> MPI_Send: invalid tag argument: Invalid argument: out of range
> (rank 1,
> MPI_COMM_WORLD)
> Rank (0, MPI_COMM_WORLD): Call stack within LAM:
> Rank (1, MPI_COMM_WORLD): Call stack within LAM:
> Rank (1, MPI_COMM_WORLD): - MPI_Send()
> Rank (1, MPI_COMM_WORLD): - main()
> Rank (0, MPI_COMM_WORLD): - MPI_Recv()
> Rank (0, MPI_COMM_WORLD): - main()
> ----------------------------------------------------------------------
> --
> -----
> One of the processes started by mpirun has exited with a nonzero exit
> code. This typically indicates that the process finished in error.
> If your process did not finish in error, be sure to include a "return
> 0" or "exit(0)" in your C code before exiting the application.
>
> PID 22373 failed on node n0 (127.0.0.1) with exit status 22.
> ----------------------------------------------------------------------
> --
> -----
> mpirun failed with exit status 22
>
> ===========================code=======================================
> ==
> =
> void hSndRcv(){
> if(my_rank != comm_size-1){
> MPI_Send(h_x+Z_OFFSET(my_dim_z),
> (dim_x + 2*pml)*(dim_y + 2*pml), \
> MPI_FLOAT, \
> my_rank+1, \
> 3, \
> MPI_COMM_WORLD);
> MPI_Send(h_y+Z_OFFSET(my_dim_z),
> (dim_x + 2*pml)*(dim_y + 2*pml), \
> MPI_FLOAT, \
> my_rank+1, \
> 4, \
> MPI_COMM_WORLD);
> MPI_Send(h_z+Z_OFFSET(my_dim_z),
> (dim_x + 2*pml)*(dim_y + 2*pml), \
> MPI_FLOAT, \
> my_rank+1, \
> 5, \
> MPI_COMM_WORLD);
> }
> if(my_rank){
> MPI_Recv(h_x,
> (dim_x + 2*pml)*(dim_y + 2*pml), \
> MPI_FLOAT, \
> my_rank-1, \
> 3, \
> MPI_COMM_WORLD, \
> status);
> MPI_Recv(h_y,
> (dim_x + 2*pml)*(dim_y + 2*pml), \
> MPI_FLOAT, \
> my_rank-1, \
> 4, \
> MPI_COMM_WORLD, \
> status);
> MPI_Recv(h_z,
> (dim_x + 2*pml)*(dim_y + 2*pml), \
> MPI_FLOAT, \
> my_rank-1, \
> 5, \
> MPI_COMM_WORLD, \
> status);
> }
> }
>
> Sam Adams
> General Dynamics - Network Systems
> Phone: 210.536.5945
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
--
Jeff Squyres
Cisco Systems
_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/
|