Seems the process of lam can not invoke multiple nsend()/nrecv() call
within multiple threads, or the call may be blocked on
read(socket-to-the-lam-kernel-process) forever. That's due to the single
socket between the process and the lam kernel process, right?
Anyway, if I modified my new process to run in sequent flow, instead of
multithread, that issue would disappear.
Thanks!
> Hi Brian,
>
> Thank you very much! And thank Dr. Squyres!
>
> I tried the following configuration file, and did every other things
> as you indicated (in fact, I always did that :-) ):
> lamd $inet_topo $debug $session_prefix $session_suffix
> lamd_newfeature $debug $session_prefix $session_suffix
> But still got the same issue.
>
> As before, sometimes, with the same code and the same environment,
> it works, all messages are sent and received. But most of the time,
> some message is lost, or some message can not be sent out.
>
> I am investigating the lam code and trying to figure out how it works
> and how to fix the issue. Seems the communication mechanism is very
> complicated.
>
> THE FAILURE OF MESSAGE LOSS
> 1. I ran lam with new feature process on four nodes: n0-n3.
> 2. A lot of messages were sent and received among the nodes.
> 3. Then, some message was lost(sometimes this message, sometimes that
> message).
> 4. Say, the message lost is from n0 to n3:
> - on node n0, we can see from the printf statement
> I added in nsend that this message is sent out
> - on node n3, the message arrived, but not reach nrecv()
> state n3: -- show nothing except the line of "NODE INDEX PID..."
> state n0: -- this command would suspend (just stop)
> bfstate n3: -- show
> NODE DEST EVENT TYPE LENGTH
> n3 n3 40000016 0 9
> n3 n3 40000016 0 9
> bfstate n0: -- as state n0, would suspend
>
> I implemented two versions of receive code to receive the message:
> ntry_recv() to repeatedly query the message, and nrecv() in a thread
> to block on the event to wait the message. Both of them got the
> previous issue.
>
> MORE INFORMATION ON THE NEW DAEMON (the main logic)
> 1A. daemon process
>
> int main(int argc, char *argv[])
> {
> ...
> // Initialize, deny-root, umask(077), ao_init(), ao_parse(),
> // lam_tmpdir_init(), etc.
> ...
> /*
> * Attach to kernel.
> */
> if (kinit(PRGRPCOMM)) {
> lampanic("grp_comm (kinit)");
> }
>
> /*
> * Attach to kernel.
> */
> if (lpattach("grp_comm")) {
> lampanic("grp_comm (lpattach)");
> }
> ...
> if(n0)
> for(n1 to n-last)
> send_message(from, to, &msg);
>
> ...
> // in the main of the process, we will wait for
> // message, if get one, we will deal with it,
> // maybe, send out some message to other nodes
> while(1) {
>
> LAM_ZERO_ME(incoming);
> memset((void*) msg, 0, 256);
> incoming.nh_event = GRP_COMM;
> incoming.nh_flags = 0;
> incoming.nh_msg = msg;
> incoming.nh_length = 256;
> incoming.nh_type = 0;
>
> if(ntry_recv(&incoming) == 0){
> ...
> send_message(from, to, &msg);
> ...
> }
> }
> ...
> } //end of main
>
> int send_message(int source, int destination, char msg[])
> {
> ...
> LAM_ZERO_ME(outgoing);
> outgoing.nh_node = destination;
> outgoing.nh_event = GRP_COMM;
> outgoing.nh_type = 0;
> outgoing.nh_flags = 0;
> outgoing.nh_length = strlen(msg) + 1;
> outgoing.nh_msg = msg;
>
> if(nsend(&outgoing) == 0){
> return 0
> }
>
> ...
> }
>
> 1B. another version to receive message with nrecv(), instead of
> ntry_recv()
>
> In main(), create a thread to wait for message:
>
> void pnrecv()
> {
> char msg[256];
> while(active) {
> pthread_mutex_lock(&nrecv_lock);
> while(nrecv_tag == FALSE)
> pthread_cond_wait(&nrecv_cond, &nrecv_lock);
> nrecv_tag = FALSE;
> pthread_mutex_unlock(&nrecv_lock);
>
> LAM_ZERO_ME(incoming);
> memset((void*) msg, 0, 256);
> incoming.nh_event = GRP_COMM;
> incoming.nh_flags = 0;
> incoming.nh_msg = msg;
> incoming.nh_type = 174;
> incoming.nh_length = 256;
>
> if(nrecv(&incoming)) lampanic("grp_comm (nrecv)");
>
> pthread_mutex_lock(&msg_lock);
> memcpy(nrecv_msg, msg, sizeof(msg));
> pthread_mutex_unlock(&msg_lock);
>
> pthread_mutex_lock(&flag_lock);
> nrecv_flag = 1;
> pthread_mutex_unlock(&flag_lock);
>
> fclose(fp_recv);
> }
> ...
> }
>
> But both of nrecv() and ntry_recv() don't work correctly.
>
> 2. add
> #define GRP_COMM (LAM_BASE_EVENT + 25)
> to lam/include/event.h and lam/include/event.h
>
> 3. add
> #define PRGRPCOMM PRDAEMON
> to lam/include/priority.h and share/include/priority.h
>
> Other code is not relative to the usage of nrecv()/nsend().
> The main logic and control flow of message is like this.
>
> I read a lot on the code of other daemons under otb/sys, seems
> there is no special things with their use of nrecv()/nsend(),
> and I didn't miss something in the new daemon (right?).
>
> I am diving into the code of lam message-passing now.
> Any comments and suggests are welcome!
>
> Thanks and best regards,
> Chao
>
>> Sorry about the long delay in replying -- I've been out of town the
>> last couple of days.
>>
>> It looks like you have the right idea, and I don't see anything
>> obviously wrong from your code snippet. You took the approach that I
>> would have taken in having your new "daemon" be a separate process.
>> When looking at all the "daemons" in otb/sys/, keep in mind that the
>> code is designed to be used in either one big process or a bunch of
>> little processes. The "one big process" model was added after the
>> project was well underway, and plays some games to make it work. One
>> of those games is that nsend / nrecv don't actually block. Instead,
>> they schedule activity to be done when control is returned to the
>> kernel. There are some very tight restrictions placed on the use of
>> nsend / nrecv when running this way. In particular, there can only
>> be one nsend and one nrecv call scheduled during a single function
>> call, then control must return to the kernel before the next send /
>> recv can be posted. Also, packet length is strictly limited to being
>> less than MAXNMSGSIZE bytes long. Since you are running in your own
>> process, none of these apply to you (thankfully - makes life much
>> easier that way).
>>
>> By the way, there's no reason you can't modify the lamd-conf.lamd
>> file (the one for the one big-lamd model) to start your new_feature
>> daemon next to the lamd. Your file would then look like:
>>
>> lamd $inet_topo $debug $session_prefix $session_suffix
>> lamd_newfeature $debug $session_prefix $session_suffix
>>
>> I don't know if that's useful or not to you, but thought I would
>> point out that it should work.
>>
>> Like I said, I don't see anything obvious in your code, so I'll point
>> out a couple of things. If these don't help, I might to look at some
>> more of your code to be of any real help. First, your process should
>> kinit() with a priority of PRDAEMON. There's really no benefit to
>> having your own priority. There are a couple of services that
>> require it, but I don't think that what you are trying to do would
>> fit that category. One very, very important thing is that your EVNEW
>> constant be defined to something that isn't used elsewhere. Look at
>> the long comments in share/include/event.h to see how event numbers
>> are chosen in LAM. This is a frequent issue when using the LAM
>> communication layer.
>>
>> If none of that helps, the next thing is to try to use some of the
>> LAM tools to look at where your messages are going. You can use
>> state and bfstate commands (LAM must be configured with the --with-
>> trillium option for these to be built and installed) to see what is
>> running on a particular node and what is up with the communication
>> buffers on a particular node.
>>
>> Hope this helps a bit. If not, let me know, including as much
>> information about your daemon as possible. It's a bit hard to debug
>> code at that level and even harder when I don't know what the code is
>> doing ;).
>>
>> Brian
>>
>>
>
>
|