I will send config.log if necessary, but will need someone else to do
it (I cannot for non-technical reasons).
We are using LAM 7.0 on Sun Sparc Solaris 8 for interprocess messaging
within a single node. We have a test which sends 10 messages of about
49 kbytes each in rapid succession. The send is done using MPI_Bsend
(MPI_Buffer_attach was called during initialization, providing an 8
Mbyte buffer). There are no errors returned from the MPI_Bsend calls.
In the receiving process, several of the 10 messages are received
without error (the first 2, 3, or 4 messages) but then on the next
call to MPI_Recv, we get an error code 21=MPI_ERR_LOCALDEAD (the
sender is not really dead at this point). We have had no problems when
using "less stressing" messaging. This is with the tcp RPI. The
tunable rpi_tcp_short is at its default value of 64 kb. We assume that
is sufficient since the message is only 49 kb.
Does this sound like any known bug fixed somewhere between 7.0 and
7.0.6? Should we be using a diffierent RPI? It seems odd to be using
TCP when there is no actual networking involved, but this appears to
be the default.
-- David
|