LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Andriy Y. Fedorov (fedorov_at_[hidden])
Date: 2003-07-23 23:55:56


        Hello,

   I attach very simple MPI program, which posts a number of MPI_Isends(), and
   then creates a new thread to receive all posted messages (consider 2
   processor case). Note, that the thread is created _after_ all MPI_Isend()s,
   and only single thread is within MPI at a time. Program completes ok when I
   just call thread_func(), but when I create a thread to execute the very
   same function, MPI_Get_count() returns bogus tag and source values:

   1: MPI_Isend OK
   1: MPI_Isend OK
   1: Receiving 2048 bytes, tag = 48, src = 0
   1: MPI_Recv OK
   1: Receiving 2048 bytes, tag = 4294934530, src = 4294934530
   MPI_Recv: invalid tag argument (rank 1, MPI_COMM_WORLD)
   Rank (1, MPI_COMM_WORLD): Call stack within LAM:
   Rank (1, MPI_COMM_WORLD): - MPI_Recv()
   Rank (1, MPI_COMM_WORLD): - main()

   I checked LAM MPI documentation, which says:

   "If user programs utilized multiple threads, they must ensure that only one
   thread uses LAM at a time. Unpredictable results (read: crash and burn)
   will occur if multiple threads access LAM simultaneously."

   I believe that's what I have! Only one thread is in MPI!
   
   I compiled my program with -D_REENTRANT and -lpthread options. Some
   messages from the archive suggested, that MPI must be compiled with
   -D_REENTRANT. I recompiled LAM 7.0 with this flag -- didn't help...

   I also read about MPI_Init_thread(), but comment in the 7.0 source says,

   "Using 'MPI_THREAD_SERIALIZED' will cause LAM to place locks around all
    MPI calls such that only one thread will be able to enter the MPI
    library at a time; beware of this fact for portability with other MPI
    implementations. Even with multiple threads, deadlock is still
    possible when using 'MPI_THREAD_SERIALIZED' -- applications still need
    to be aware of this and code appropriately."

    Well... I don't need locks in this program! There's nothing to
    synchronize, it just should work! What is wrong -- LAM manual, which I
    cited here and followed, or this program?

    Please, anybody, help me out here! I am totally puzzled... If MPI just
    shouldn't be used from multiple threads, can you explain me WHY? (consider
    everything is perfectly synchronized) Thank you very much in advance!

    ps One more detail. I have it crashing on Solaris. Runs fine on RH9
    2.4.20-19.9...

-- 
  Andriy Fedorov
  
  Department of Computer Science,
  College of William & Mary
  P.O. Box 8795
  Williamsburg, VA 23185-8795, USA
  ---
  http://www.cs.wm.edu/~fedorov