> Yes. I add the new feature as a pseudo-daemon of lamd,
> just as echod, dli_inet, dlo_inet, etc.
I should re-clarify here :)
I didn't use lamd as a big process to boot lam,
instead, I used the following config-file to boot lam as
a cluster of processes, including the new process I added:
lamd_kernel $debug $session_prefix $session_suffix
lamd_router $debug $session_prefix $session_suffix
lamd_kenyad $debug $session_prefix $session_suffix
lamd_dli_inet $inet_topo $debug $session_prefix $session_suffix
lamd_dlo_inet $debug $session_prefix $session_suffix
lamd_bufferd $debug $session_prefix $session_suffix
lamd_bforward $debug $session_prefix $session_suffix
lamd_loadd $debug $session_prefix $session_suffix
lamd_echod $debug $session_prefix $session_suffix
lamd_flatd $debug $session_prefix $session_suffix
lamd_filed $debug $session_prefix $session_suffix
lamd_traced $debug $session_prefix $session_suffix
lamd_iod $debug $session_prefix $session_suffix
lamd_haltd $debug $session_prefix $session_suffix
lamd_versiond $debug $session_prefix $session_suffix
lamd_newfeature $debug $session_prefix $session_suffix
Thanks!
>
> Also, I use nsend()/nrecv() the way as its using in echod, filed,
> but I met the issue as I mentioned.
>
> So, then what's the difference using nsend/nrecv,
> how should I use them? It's really confused for me.
> Seems the message is received at the receiver node,
> but it's lost among the daemon processes.
>
> Thank you very much!
>
> Chao
>
>> To clarify -- are you adding another pseudo-daemon inside the lamd
>> itself? If so, the communication model is a little different using
>> nsend/nrecv (vs. processes outside of the lamd).
>>
>> My answer to your question depends on the answer to the above
>> question. :-)
>>
>>
>> On Jan 25, 2006, at 11:34 PM, wchao_at_[hidden] wrote:
>>
>>> In addition to the things I mentioned in the previous mail,
>>> I also found:
>>> For some message, node 1 sends to node 0 by nsend(), and node 0
>>> waits it with nrecv(). node 1 does send it out.
>>> And, from the printf statement I added in dsend(), the message does
>>> appear on node 0, but it appeared in dsend(), which is strange,
>>> but not reach nrecv() on node 0. So, it means the message is lost
>>> for the nrecv() on node 0.
>>>
>>> Any idea on such issue? Thanks a lot!
>>>
>>> ---------------------------- Original Message
>>> ---------------------------
>>> I am adding a feature to lam, and the new feature is running as a
>>> single
>>> process of lamd. So, I defined a priority for it:
>>> #define PRNEW PRDAEMON
>>> and call kinit(PRNEW) in the new process.
>>>
>>> In the new code, I used nsend()/ntry_recv() to communicate among them:
>>>
>>> LAM_ZERO_ME(outgoing);
>>> outgoing.nh_node = destination; //0, 1, 2, or 3
>>> // on 4 nodes test environment
>>> outgoing.nh_event = EVNEW;
>>> outgoing.nh_type = 0;
>>> outgoing.nh_flags = 0;
>>> outgoing.nh_length = strlen(msg) + 1;
>>> outgoing.nh_msg = msg;
>>>
>>> nsend(&outgoing);
>>>
>>> ...
>>>
>>> LAM_ZERO_ME(incoming);
>>> memset((void*) msg, 0, 256);
>>> incoming.nh_event = EVNEW;
>>> incoming.nh_flags = 0;
>>> incoming.nh_msg = msg;
>>> incoming.nh_length = 256;
>>> incoming.nh_type = 0;
>>>
>>> while(ntry_recv(&incoming) == 0){
>>>
>>> Then, sometimes nsend()/ntry_recv() works, and all messages between
>>> the 4
>>> nodes are sent and received.
>>>
>>> But most of the time, during the messages communication, some
>>> message would
>>> be sent and the receiver didn't receive it, or some message was
>>> suspending
>>> on nsend() but the receiver is reachable with tping.
>>>
>>> I tried to adjust the priority of the new process, to update the
>>> nh_type
>>> and nh_event, and to use nrecv() instead of ntry_recv(), but it
>>> didn't fix
>>> the issue. Seems something is wrong with the event queue, the message
>>> sent from the new process is got by other process of the lamd, but the
>>> nh_event should have avoid such case. I'm really confused here.
>>>
>>> So, what's wrong with it? Is my using of nsend()/nrecv() right? or
>>> anything is missed?
>>>
>>> Any comments and suggests are welcome! Thanks!
>>>
>>> Chao
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> lam-devel mailing list
>>> lam-devel_at_[hidden]
>>> http://www.lam-mpi.org/mailman/listinfo.cgi/lam-devel
>>
>>
>> --
>> {+} Jeff Squyres
>> {+} The Open MPI Project
>> {+} http://www.open-mpi.org/
>>
>>
>> _______________________________________________
>> lam-devel mailing list
>> lam-devel_at_[hidden]
>> http://www.lam-mpi.org/mailman/listinfo.cgi/lam-devel
>>
>
>
|