On Sat, Jul 26, 2003 at 06:18:48PM -0500, Brian W. Barrett wrote:
> On Sat, 26 Jul 2003, Andriy Y. Fedorov wrote:
>
> > Could you please look into this question? If I do something wrong, why
> > don't you point it out, or there's a mistake in the LAM FAQ?
> > I attach one more test. In this test, if THREAD is defined, MPI is
> > initialized and used in a separate thread. If it is not, the thread function
> > is called directly. It fails for the same problem I've described earlier in
> > the first case and succeeds in the second.
>
> We aren't sure ecavtly what is going on, and are having trouble
> duplicating the problem on our Solaris 8 machines. Can you send us the
> output of 'mpicc -showme' and 'laminfo'?
>
> My currently thought is that there is some strange (read: bad)
> intereaction between Solaris threads (which are used by default in LAM
> with Solaris) and Pthreads. But I don't know that for sure.
>
> Brian
This is a LAM build bug to do with threads and errno. I did a fresh
build on Solaris with the Sun compilers and the SSI RPI modules are
not being compiled with -mt hence functions in the RPIs do not get the
correct errno value when running in a thread other than the main thread.
Functions like swritev() in ssi_rpi_tcp_low.c use errno to detect error
conditions when writing to sockets amd are getting thrown off by not
getting the correct value for errno. When everything runs in the main
thread the correct value of errno is picked up and this is why the code
runs OK in that case. Recompiling the SSI RPI modules with -mt fixes
the problem.
-nick
|