LAM/MPI logo

LAM/MPI Development Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2006-01-26 08:33:14


To clarify -- are you adding another pseudo-daemon inside the lamd
itself? If so, the communication model is a little different using
nsend/nrecv (vs. processes outside of the lamd).

My answer to your question depends on the answer to the above
question. :-)

On Jan 25, 2006, at 11:34 PM, wchao_at_[hidden] wrote:

> In addition to the things I mentioned in the previous mail,
> I also found:
> For some message, node 1 sends to node 0 by nsend(), and node 0
> waits it with nrecv(). node 1 does send it out.
> And, from the printf statement I added in dsend(), the message does
> appear on node 0, but it appeared in dsend(), which is strange,
> but not reach nrecv() on node 0. So, it means the message is lost
> for the nrecv() on node 0.
>
> Any idea on such issue? Thanks a lot!
>
> ---------------------------- Original Message
> ---------------------------
> I am adding a feature to lam, and the new feature is running as a
> single
> process of lamd. So, I defined a priority for it:
> #define PRNEW PRDAEMON
> and call kinit(PRNEW) in the new process.
>
> In the new code, I used nsend()/ntry_recv() to communicate among them:
>
> LAM_ZERO_ME(outgoing);
> outgoing.nh_node = destination; //0, 1, 2, or 3
> // on 4 nodes test environment
> outgoing.nh_event = EVNEW;
> outgoing.nh_type = 0;
> outgoing.nh_flags = 0;
> outgoing.nh_length = strlen(msg) + 1;
> outgoing.nh_msg = msg;
>
> nsend(&outgoing);
>
> ...
>
> LAM_ZERO_ME(incoming);
> memset((void*) msg, 0, 256);
> incoming.nh_event = EVNEW;
> incoming.nh_flags = 0;
> incoming.nh_msg = msg;
> incoming.nh_length = 256;
> incoming.nh_type = 0;
>
> while(ntry_recv(&incoming) == 0){
>
> Then, sometimes nsend()/ntry_recv() works, and all messages between
> the 4
> nodes are sent and received.
>
> But most of the time, during the messages communication, some
> message would
> be sent and the receiver didn't receive it, or some message was
> suspending
> on nsend() but the receiver is reachable with tping.
>
> I tried to adjust the priority of the new process, to update the
> nh_type
> and nh_event, and to use nrecv() instead of ntry_recv(), but it
> didn't fix
> the issue. Seems something is wrong with the event queue, the message
> sent from the new process is got by other process of the lamd, but the
> nh_event should have avoid such case. I'm really confused here.
>
> So, what's wrong with it? Is my using of nsend()/nrecv() right? or
> anything is missed?
>
> Any comments and suggests are welcome! Thanks!
>
> Chao
>
>
>
>
> _______________________________________________
> lam-devel mailing list
> lam-devel_at_[hidden]
> http://www.lam-mpi.org/mailman/listinfo.cgi/lam-devel

-- 
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/