LAM/MPI logo

LAM/MPI Development Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: wchao_at_[hidden]
Date: 2006-01-25 23:34:19


I posted this issue to the user list in the afternoon,
but I found it's better to be posted here, the developer list.

In addition to the things I mentioned in the previous mail,
I also found:
For some message, node 1 sends to node 0 by nsend(), and node 0
waits it with nrecv(). node 1 does send it out.
And, from the printf statement I added in dsend(), the message does
appear on node 0, but it appeared in dsend(), which is strange,
but not reach nrecv() on node 0. So, it means the message is lost
for the nrecv() on node 0.

Any idea on such issue? Thanks a lot!

---------------------------- Original Message ----------------------------
Subject: issue with nsend()/ntry_recv()
From: wchao
Date: Wed, January 25, 2006 4:36 pm
To: lam_at_[hidden]
--------------------------------------------------------------------------

Hi, there,

I am adding a feature to lam, and the new feature is running as a single
process of lamd. So, I defined a priority for it:
#define PRNEW PRDAEMON
and call kinit(PRNEW) in the new process.

In the new code, I used nsend()/ntry_recv() to communicate among them:

    LAM_ZERO_ME(outgoing);
    outgoing.nh_node = destination; //0, 1, 2, or 3
                                    // on 4 nodes test environment
    outgoing.nh_event = EVNEW;
    outgoing.nh_type = 0;
    outgoing.nh_flags = 0;
    outgoing.nh_length = strlen(msg) + 1;
    outgoing.nh_msg = msg;

    nsend(&outgoing);

...

            LAM_ZERO_ME(incoming);
            memset((void*) msg, 0, 256);
            incoming.nh_event = EVNEW;
            incoming.nh_flags = 0;
            incoming.nh_msg = msg;
            incoming.nh_length = 256;
            incoming.nh_type = 0;

            while(ntry_recv(&incoming) == 0){

Then, sometimes nsend()/ntry_recv() works, and all messages between the 4
nodes are sent and received.

But most of the time, during the messages communication, some message would
be sent and the receiver didn't receive it, or some message was suspending
on nsend() but the receiver is reachable with tping.

I tried to adjust the priority of the new process, to update the nh_type
and nh_event, and to use nrecv() instead of ntry_recv(), but it didn't fix
the issue. Seems something is wrong with the event queue, the message
sent from the new process is got by other process of the lamd, but the
nh_event should have avoid such case. I'm really confused here.

So, what's wrong with it? Is my using of nsend()/nrecv() right? or
anything is missed?

Any comments and suggests are welcome! Thanks!

Chao