LAM/MPI logo

LAM/MPI Development Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: wchao_at_[hidden]
Date: 2006-01-26 10:07:37


Yes. I add the new feature as a pseudo-daemon of lamd,
just as echod, dli_inet, dlo_inet, etc.

Also, I use nsend()/nrecv() the way as its using in echod, filed,
but I met the issue as I mentioned.

So, then what's the difference using nsend/nrecv,
how should I use them? It's really confused for me.
Seems the message is received at the receiver node,
but it's lost among the daemon processes.

Thank you very much!

Chao

> To clarify -- are you adding another pseudo-daemon inside the lamd
> itself? If so, the communication model is a little different using
> nsend/nrecv (vs. processes outside of the lamd).
>
> My answer to your question depends on the answer to the above
> question. :-)
>
>
> On Jan 25, 2006, at 11:34 PM, wchao_at_[hidden] wrote:
>
>> In addition to the things I mentioned in the previous mail,
>> I also found:
>> For some message, node 1 sends to node 0 by nsend(), and node 0
>> waits it with nrecv(). node 1 does send it out.
>> And, from the printf statement I added in dsend(), the message does
>> appear on node 0, but it appeared in dsend(), which is strange,
>> but not reach nrecv() on node 0. So, it means the message is lost
>> for the nrecv() on node 0.
>>
>> Any idea on such issue? Thanks a lot!
>>
>> ---------------------------- Original Message
>> ---------------------------
>> I am adding a feature to lam, and the new feature is running as a
>> single
>> process of lamd. So, I defined a priority for it:
>> #define PRNEW PRDAEMON
>> and call kinit(PRNEW) in the new process.
>>
>> In the new code, I used nsend()/ntry_recv() to communicate among them:
>>
>> LAM_ZERO_ME(outgoing);
>> outgoing.nh_node = destination; //0, 1, 2, or 3
>> // on 4 nodes test environment
>> outgoing.nh_event = EVNEW;
>> outgoing.nh_type = 0;
>> outgoing.nh_flags = 0;
>> outgoing.nh_length = strlen(msg) + 1;
>> outgoing.nh_msg = msg;
>>
>> nsend(&outgoing);
>>
>> ...
>>
>> LAM_ZERO_ME(incoming);
>> memset((void*) msg, 0, 256);
>> incoming.nh_event = EVNEW;
>> incoming.nh_flags = 0;
>> incoming.nh_msg = msg;
>> incoming.nh_length = 256;
>> incoming.nh_type = 0;
>>
>> while(ntry_recv(&incoming) == 0){
>>
>> Then, sometimes nsend()/ntry_recv() works, and all messages between
>> the 4
>> nodes are sent and received.
>>
>> But most of the time, during the messages communication, some
>> message would
>> be sent and the receiver didn't receive it, or some message was
>> suspending
>> on nsend() but the receiver is reachable with tping.
>>
>> I tried to adjust the priority of the new process, to update the
>> nh_type
>> and nh_event, and to use nrecv() instead of ntry_recv(), but it
>> didn't fix
>> the issue. Seems something is wrong with the event queue, the message
>> sent from the new process is got by other process of the lamd, but the
>> nh_event should have avoid such case. I'm really confused here.
>>
>> So, what's wrong with it? Is my using of nsend()/nrecv() right? or
>> anything is missed?
>>
>> Any comments and suggests are welcome! Thanks!
>>
>> Chao
>>
>>
>>
>>
>> _______________________________________________
>> lam-devel mailing list
>> lam-devel_at_[hidden]
>> http://www.lam-mpi.org/mailman/listinfo.cgi/lam-devel
>
>
> --
> {+} Jeff Squyres
> {+} The Open MPI Project
> {+} http://www.open-mpi.org/
>
>
> _______________________________________________
> lam-devel mailing list
> lam-devel_at_[hidden]
> http://www.lam-mpi.org/mailman/listinfo.cgi/lam-devel
>