LAM/MPI logo

LAM/MPI Development Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: wchao_at_[hidden]
Date: 2006-01-29 02:36:09


Hi Brian,

Thank you very much! And thank Dr. Squyres!

I tried the following configuration file, and did every other things
as you indicated (in fact, I always did that :-) ):
   lamd $inet_topo $debug $session_prefix $session_suffix
   lamd_newfeature $debug $session_prefix $session_suffix
But still got the same issue.

As before, sometimes, with the same code and the same environment,
it works, all messages are sent and received. But most of the time,
some message is lost, or some message can not be sent out.

I am investigating the lam code and trying to figure out how it works
and how to fix the issue. Seems the communication mechanism is very
complicated.

THE FAILURE OF MESSAGE LOSS
1. I ran lam with new feature process on four nodes: n0-n3.
2. A lot of messages were sent and received among the nodes.
3. Then, some message was lost(sometimes this message, sometimes that
message).
4. Say, the message lost is from n0 to n3:
   - on node n0, we can see from the printf statement
     I added in nsend that this message is sent out
   - on node n3, the message arrived, but not reach nrecv()
state n3: -- show nothing except the line of "NODE INDEX PID..."
state n0: -- this command would suspend (just stop)
bfstate n3: -- show
           NODE DEST EVENT TYPE LENGTH
           n3 n3 40000016 0 9
           n3 n3 40000016 0 9
bfstate n0: -- as state n0, would suspend

I implemented two versions of receive code to receive the message:
ntry_recv() to repeatedly query the message, and nrecv() in a thread
to block on the event to wait the message. Both of them got the
previous issue.

MORE INFORMATION ON THE NEW DAEMON (the main logic)
1A. daemon process

int main(int argc, char *argv[])
{
...
// Initialize, deny-root, umask(077), ao_init(), ao_parse(),
// lam_tmpdir_init(), etc.
...
/*
 * Attach to kernel.
 */
        if (kinit(PRGRPCOMM)) {
          lampanic("grp_comm (kinit)");
        }

/*
 * Attach to kernel.
 */
        if (lpattach("grp_comm")) {
                lampanic("grp_comm (lpattach)");
        }
...
if(n0)
     for(n1 to n-last)
         send_message(from, to, &msg);

...
// in the main of the process, we will wait for
// message, if get one, we will deal with it,
// maybe, send out some message to other nodes
while(1) {

            LAM_ZERO_ME(incoming);
            memset((void*) msg, 0, 256);
            incoming.nh_event = GRP_COMM;
            incoming.nh_flags = 0;
            incoming.nh_msg = msg;
            incoming.nh_length = 256;
            incoming.nh_type = 0;

            if(ntry_recv(&incoming) == 0){
                      ...
                      send_message(from, to, &msg);
                      ...
            }
}
...
} //end of main

int send_message(int source, int destination, char msg[])
{
...
    LAM_ZERO_ME(outgoing);
    outgoing.nh_node = destination;
    outgoing.nh_event = GRP_COMM;
    outgoing.nh_type = 0;
    outgoing.nh_flags = 0;
    outgoing.nh_length = strlen(msg) + 1;
    outgoing.nh_msg = msg;

    if(nsend(&outgoing) == 0){
        return 0
    }

...
}

1B. another version to receive message with nrecv(), instead of ntry_recv()

In main(), create a thread to wait for message:

void pnrecv()
{
  char msg[256];
  while(active) {
    pthread_mutex_lock(&nrecv_lock);
    while(nrecv_tag == FALSE)
      pthread_cond_wait(&nrecv_cond, &nrecv_lock);
    nrecv_tag = FALSE;
    pthread_mutex_unlock(&nrecv_lock);

    LAM_ZERO_ME(incoming);
    memset((void*) msg, 0, 256);
    incoming.nh_event = GRP_COMM;
    incoming.nh_flags = 0;
    incoming.nh_msg = msg;
    incoming.nh_type = 174;
    incoming.nh_length = 256;

    if(nrecv(&incoming)) lampanic("grp_comm (nrecv)");

    pthread_mutex_lock(&msg_lock);
    memcpy(nrecv_msg, msg, sizeof(msg));
    pthread_mutex_unlock(&msg_lock);

    pthread_mutex_lock(&flag_lock);
    nrecv_flag = 1;
    pthread_mutex_unlock(&flag_lock);

    fclose(fp_recv);
  }
...
}

But both of nrecv() and ntry_recv() don't work correctly.

2. add
#define GRP_COMM (LAM_BASE_EVENT + 25)
to lam/include/event.h and lam/include/event.h

3. add
#define PRGRPCOMM PRDAEMON
to lam/include/priority.h and share/include/priority.h

Other code is not relative to the usage of nrecv()/nsend().
The main logic and control flow of message is like this.

I read a lot on the code of other daemons under otb/sys, seems
there is no special things with their use of nrecv()/nsend(),
and I didn't miss something in the new daemon (right?).

I am diving into the code of lam message-passing now.
Any comments and suggests are welcome!

Thanks and best regards,
Chao

> Sorry about the long delay in replying -- I've been out of town the
> last couple of days.
>
> It looks like you have the right idea, and I don't see anything
> obviously wrong from your code snippet. You took the approach that I
> would have taken in having your new "daemon" be a separate process.
> When looking at all the "daemons" in otb/sys/, keep in mind that the
> code is designed to be used in either one big process or a bunch of
> little processes. The "one big process" model was added after the
> project was well underway, and plays some games to make it work. One
> of those games is that nsend / nrecv don't actually block. Instead,
> they schedule activity to be done when control is returned to the
> kernel. There are some very tight restrictions placed on the use of
> nsend / nrecv when running this way. In particular, there can only
> be one nsend and one nrecv call scheduled during a single function
> call, then control must return to the kernel before the next send /
> recv can be posted. Also, packet length is strictly limited to being
> less than MAXNMSGSIZE bytes long. Since you are running in your own
> process, none of these apply to you (thankfully - makes life much
> easier that way).
>
> By the way, there's no reason you can't modify the lamd-conf.lamd
> file (the one for the one big-lamd model) to start your new_feature
> daemon next to the lamd. Your file would then look like:
>
> lamd $inet_topo $debug $session_prefix $session_suffix
> lamd_newfeature $debug $session_prefix $session_suffix
>
> I don't know if that's useful or not to you, but thought I would
> point out that it should work.
>
> Like I said, I don't see anything obvious in your code, so I'll point
> out a couple of things. If these don't help, I might to look at some
> more of your code to be of any real help. First, your process should
> kinit() with a priority of PRDAEMON. There's really no benefit to
> having your own priority. There are a couple of services that
> require it, but I don't think that what you are trying to do would
> fit that category. One very, very important thing is that your EVNEW
> constant be defined to something that isn't used elsewhere. Look at
> the long comments in share/include/event.h to see how event numbers
> are chosen in LAM. This is a frequent issue when using the LAM
> communication layer.
>
> If none of that helps, the next thing is to try to use some of the
> LAM tools to look at where your messages are going. You can use
> state and bfstate commands (LAM must be configured with the --with-
> trillium option for these to be built and installed) to see what is
> running on a particular node and what is up with the communication
> buffers on a particular node.
>
> Hope this helps a bit. If not, let me know, including as much
> information about your daemon as possible. It's a bit hard to debug
> code at that level and even harder when I don't know what the code is
> doing ;).
>
> Brian
>
>