On Wed, 6 Aug 2003, etienne gondet wrote:
> >- in the case where you do not use an app schema, you have the maxprocs
> >argument of MPI_Comm_spawn set to 1, meaning that only one "block"
> >executable is launched.
>
> I don't understand what twofolds means but on that IBM SP4 the block
> is never started on the other node. I go with rsh on that other nodes
> and I never see a process called block with a ps -edf and the driver
> processus is in a deadlock. The problem is before the pingpong and
Are you sure that block is not spawned on the current node? Without an
app schema, it should be launched on n0, not n1. There could be a
buffering issue such that you simply do not see all the output before the
deadlock.
> relative process management not to communication protocol and buffers. I
> understand because I spawn a monoprocessus block that it should deadlock
> later in the pingpong after 64k. But I just reduced the number of
> processus in case of.
I would suggest a few things:
- lower your message size to less than 64k (say, 20 bytes).
- make the argument to MPI_Comm_spawn (in the cases without app schemas)
be 2, not 1.
Unless something else is going wrong, either or both of these should allow
your program to complete.
> >lamhalt will hang for a while if any of the LAM daemons have already died.
>
> A very long while.
It should only wait for about 15 seconds before giving up. Can you attach
a debugger and see where exactly lamhalt is stuck?
(you may need to recompile LAM with -g to get useful information here)
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|