Hi all,

I'm doing a simple image convolution.

At this point, I'm just sending different rows of the image to the clients (rank>0) from the server (rank=0), having the client do the convolution for that row and send back the resulting image to the server.....[Once I understand whats going on....its trivial to send all the rows to all the diff clients and then go down and do a receive from all of them]

I'm able to send to the clients and receive from them once only.

After this (from the output of -sa), the clients exit cleanly (kill = 0 and status=0) [but they aren't supposed to]

So, the server cant send/receive from them anymore

And so the server (rank 0) ends up with kill=1 (bad exit) and signal=13(permission denied ...probably because its trying to send to processes that have exited).

I'm running  LAM 7.1.2/MPI 2 on Fedora core 5

And its C code not Fortran

All insights are welcome.

--Elf

--
"For those who understand, no explanation is needed; for those who do not, none will do."