LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Bodi Debnath (bodi.debnath_at_[hidden])
Date: 2007-10-23 11:10:14


I have a program that is doing shared memory window type data sharing
between two executables (also called RMA). This program does its own
buffering before sending a lot of messages over MPI. Its running on
suse 64 with LAM 7.1.4.

The problem in the pseudo-ish code below is the following: Both
processes expose their memory using MPI_Post / MPI_Start. Do a bunch
of MPI_Puts and close the window with MPI_Complete MPI_Wait. Right
after that we read our local buffer to see what we received. When we
read our local buffer, MPI still hasnt finished exposing the data to
the local buffer I believe. Because after I send a lot of messages, I
see that the messages transmitted at the tail end of the epoch are not
present when I read the local buffer.

When I instead put a sleep before I read my local buffer, (see the
commented out line of code), I can read my messages fine. But if I
replace the sleep with a MPI_Barrier for example, then it doesnt work.

Not sure what the problem is. Is it the fact that both executables are
trying to write each other's buffer at the same time? Is it the
multiple MPI_Puts? The data written to the local buffer is not
supposed to be exposed until some later point? Missing some other kind
of sync?

Pseudo-ish Code below - (code for checking msg length etc so that we
dont write messages larger than the window size and other protection
code removed):

MPI_Group consists of the two executables
Both executables execute the following code:

function transmit() {
  do {
    MPI_Win_Post();
    MPI_Win_start();

    while(messages_sent < max_messages_per_epoch && new_msg_exists()) {
      message = get_message_from_buffer();
      MPI_Put(message.data);
      MPI_Put(message.from);
      MPI_Put(message.to);
      ++messages_sent;
    }

    // Limit on number of messages due to shared window size
    // So we need to send over multiple epochs and after each epoch we
    // read from the shared window on the receive side and store messages
    // in our receive buffer.

    // Below we are passing the number of unsent messages and the number of
    // messages sent in this epoch
    messages_not_sent = get_num_remaining_messages();
    MPI_Put(messages_not_sent); // this is put at an agreed location
in the window
    MPI_Put(messages_sent); // also put at another agreed location

    MPI_Win_Complete();
    MPI_Win_Wait();

    // sleep(1); // uncomment to make this work
    // awaiting_send is non-zero if remote transmitter has more messages to send
    awaiting_send = receive_messages();
  } while(messages_not_sent > 0 || awaiting_send > 0);
}

function receive_messages() {
  num_messages = read_shared_mpi_window(location_x);
  awaiting_send = read_shared_mpi_window(location_y);

  for(int msg_index=0; msg_index<num_messages; ++msg_index) {
    write_local_receive_buffer( read_shared_mpi_window(msg_index) );
  }

  return awaiting_send;
}