Hi all,
I'm doing a simple image convolution.
At this point, I'm just sending different rows of the image to the clients
(rank>0) from the server (rank=0), having the client do the convolution for
that row and send back the resulting image to the server.....[Once I
understand whats going on....its trivial to send all the rows to all the
diff clients and then go down and do a receive from all of them]
I'm able to send to the clients and receive from them once only.
After this (from the output of -sa), the clients exit cleanly (kill = 0 and
status=0) [but they aren't supposed to]
So, the server cant send/receive from them anymore
And so the server (rank 0) ends up with kill=1 (bad exit) and
signal=13(permission denied ...probably because its trying to send to
processes that have exited).
I'm running LAM 7.1.2/MPI 2 on Fedora core 5
And its C code not Fortran
All insights are welcome.
--Elf
--
"For those who understand, no explanation is needed; for those who do not,
none will do."
|