LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2003-04-29 13:40:12


On Tue, 29 Apr 2003, Pak, Anne O wrote:

> is there a reason why for one parallelized program, i'd be able to
> parallelize it and spawn it over, say 30 nodes, but then for another
> program (which i've parallelized in the same way as the first), i'd only
> be able to parallelize/spawn it over 10 nodes? i haven't changed the
> lamboot configuration or anything between the running of these two
> programs!!!

>From your description, it sounds like that should work fine.

> when i try to parallelize over more nodes (for the latter program),
> MATLAB crashes and i get errors like:
>
> MPI_Recv: process in local group is dead (rank 0, comm 3)

This typically means that a remote process has died unexpectedly, or you
have tried to explicitly receive from someone who is now dead. For
example, the remote process exited, seg faulted, or otherwise quit without
calling MPI_Finalize properly. Or it did call MPI_Finalize, and you
explicitly tried to receive from it after it was gone.

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/