LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-05-11 14:40:56


That's the dichotomy of the MPI spec -- there's many places where the
parameters are "int", but they really should be other typed (e.g.,
signed or unsigned, specifically-sized such as int32_t, etc.).

MPI is fun! :D

On May 11, 2007, at 2:26 PM, Adams, Samuel D Contr AFRL/HEDR wrote:

> Sorry, you were right. I thought that I commented out all of the mpi
> communication except what I had posted bellow, but it turned out
> that I
> had another little function hiding out that was sending a couple of
> floats and it had a negative tag. I forgot about that one. For some
> reason, I guess I was thinking that the tag only had to be an int and
> not necessarily an unsigned int.
>
> Sam Adams
> General Dynamics - Network Systems
> Phone: 210.536.5945
>
> -----Original Message-----
> From: lam-bounces_at_[hidden] [mailto:lam-bounces_at_[hidden]] On
> Behalf
> Of Jeff Squyres
> Sent: Thursday, May 10, 2007 8:53 PM
> To: General LAM/MPI mailing list
> Subject: Re: LAM: MPI error mesage
>
> Well, that's pretty kooky. :-(
>
> Here's the code from MPI_SEND that's generating the error:
>
> if (tag < 0 || tag > lam_mpi_max_tag) {
> return(lam_err_comm(comm, MPI_ERR_TAG, EINVAL,
> "out of range"));
> }
>
> But according to your code, that can't be happening because your tags
> are fixed positive integers (lam_mpi_max_tag is at least 32k).
>
> Are you absolutely certain that this is where the problem is
> occurring?
>
> You might want to either run this through a debugger to verify that
> a) this is where the problem is occurring, and b) what LAM thinks its
> getting as a tag value. Or you could write some quick MPI_Send /
> MPI_Recv intercept functions that utilize the PMPI layer, perhaps
> something like this:
>
> int MPI_Send(void *buf, int count, MPI_Datatype dtype, int dest,
> int tag, MPI_Comm comm)
> {
> if (tag < 0 || tag > 32767) {
> char host[4096];
> int i = 0;
> gethostbyname(host, sizeof(host));
> printf("%s:%d: got invalid tag in MPI_Send! %d\n",
> host, getpid(), tag);
> while (i == 0) sleep(5);
> }
> return PMPI_Send(buf, count, dtype, dest, tag, comm);
> }
>
> (disclaimer: typed in e-mail; not verified!)
>
> This will print out the host/pid of the offending process(es) and
> pause allowing you to attach a debugger. Modify the inner part of
> the block to suit your particular debugging tastes.
>
>
> On May 9, 2007, at 12:12 PM, Adams, Samuel D Contr AFRL/HEDR wrote:
>
>> I am getting this error when I run my code with LAM. I was using
>> this
>> code with another system that was running with a slightly older MPICH
>> and didn't get any errors like this. I would seem there is something
>> with the way I am sending and receiving slices. Can you see anything
>> obviously wrong with the way I am doing this?
>>
>> * Starting updates
>> * cycle 1
>> MPI_Recv: invalid tag argument: Invalid argument (rank 0,
>> MPI_COMM_WORLD)
>> MPI_Send: invalid tag argument: Invalid argument: out of range
>> (rank 1,
>> MPI_COMM_WORLD)
>> Rank (0, MPI_COMM_WORLD): Call stack within LAM:
>> Rank (1, MPI_COMM_WORLD): Call stack within LAM:
>> Rank (1, MPI_COMM_WORLD): - MPI_Send()
>> Rank (1, MPI_COMM_WORLD): - main()
>> Rank (0, MPI_COMM_WORLD): - MPI_Recv()
>> Rank (0, MPI_COMM_WORLD): - main()
>> ---------------------------------------------------------------------
>> -
>
>> --
>> -----
>> One of the processes started by mpirun has exited with a nonzero exit
>> code. This typically indicates that the process finished in error.
>> If your process did not finish in error, be sure to include a "return
>> 0" or "exit(0)" in your C code before exiting the application.
>>
>> PID 22373 failed on node n0 (127.0.0.1) with exit status 22.
>> ---------------------------------------------------------------------
>> -
>
>> --
>> -----
>> mpirun failed with exit status 22
>>
>> ===========================code======================================
>> =
>
>> ==
>> =
>> void hSndRcv(){
>> if(my_rank != comm_size-1){
>> MPI_Send(h_x+Z_OFFSET(my_dim_z),
>> (dim_x + 2*pml)*(dim_y + 2*pml), \
>> MPI_FLOAT, \
>> my_rank+1, \
>> 3, \
>> MPI_COMM_WORLD);
>> MPI_Send(h_y+Z_OFFSET(my_dim_z),
>> (dim_x + 2*pml)*(dim_y + 2*pml), \
>> MPI_FLOAT, \
>> my_rank+1, \
>> 4, \
>> MPI_COMM_WORLD);
>> MPI_Send(h_z+Z_OFFSET(my_dim_z),
>> (dim_x + 2*pml)*(dim_y + 2*pml), \
>> MPI_FLOAT, \
>> my_rank+1, \
>> 5, \
>> MPI_COMM_WORLD);
>> }
>> if(my_rank){
>> MPI_Recv(h_x,
>> (dim_x + 2*pml)*(dim_y + 2*pml), \
>> MPI_FLOAT, \
>> my_rank-1, \
>> 3, \
>> MPI_COMM_WORLD, \
>> status);
>> MPI_Recv(h_y,
>> (dim_x + 2*pml)*(dim_y + 2*pml), \
>> MPI_FLOAT, \
>> my_rank-1, \
>> 4, \
>> MPI_COMM_WORLD, \
>> status);
>> MPI_Recv(h_z,
>> (dim_x + 2*pml)*(dim_y + 2*pml), \
>> MPI_FLOAT, \
>> my_rank-1, \
>> 5, \
>> MPI_COMM_WORLD, \
>> status);
>> }
>> }
>>
>> Sam Adams
>> General Dynamics - Network Systems
>> Phone: 210.536.5945
>>
>>
>> _______________________________________________
>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/

-- 
Jeff Squyres
Cisco Systems