Hi,
I was testing the following code. It is a simple
non-blocking send & non blocking receive in a loop. I
was testing with various number of receivers, and
various number of messages. Sometimes(quite often) it
hangsup when the messaging is heavy ( say 1 sender
sending 10,000 messages each to 6 receivers on a 1
node lam universe). I have attached the output, and
the sources.
Why does this happen? Is it a problem with my code or
is this behavior documented? What can I do to fix
this?
(for someone not aware of uudecode & uuencode - copy
the following attachment from "begin 664
simSendRecvLam.TAR.GZ" to "end" into a file named
"test", save it, run the command uudecode test,then
run tar xvzf simSendRecvLam.TAR.GZ to get the file)
The attached uuencoded tar-zipped file contains :
1) sampleOutput.txt - example of output when it
crashes (it says internal MPI error: Invalid argument
(rank 9, MPI_COMM_WORLD) for one or more of the
receivers.)
2) appschema
3) MPI_Common_Comm.h & MPI_Init_Comm.h - just some
definitions of my own types, nothing important
4) MPI_Sim_Blocking_Comm.h & MPI_Sim_Blocking_Comm.cc
- the sources for the libsimComm.so library which
simulates blocking MPI calls with non-blocking MPI
calls. I have a valid reason for trying to simulate
blocking MPI calls with non-blocking MPI calls (its
related to fault tolerance).
5)simSend.cc & simRecv.cc - sources for the execs that
use the libsimComm.so library.
6) Makefile for the sources
Can someone help understand the behaviour? The odd
thing is it sometimes works ( say 50 percent of the
time) for the same appschema and number of messages.
It almost always executes correctly for lesser number
of messages, or lesser number of receivers.
Thanks
Vinod
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
|