is there a reason why for one parallelized program, i'd be able to
parallelize it and spawn it over, say 30 nodes, but then for another program
(which i've parallelized in the same way as the first), i'd only be able to
parallelize/spawn it over 10 nodes? i haven't changed the lamboot
configuration or anything between the running of these two programs!!!
when i try to parallelize over more nodes (for the latter program), MATLAB
crashes and i get errors like:
MPI_Recv: process in local group is dead (rank 0, comm 3)
any ideas?
thanks,
anne
Sent: Tuesday, April 29, 2003 5:59 AM
To: General LAM/MPI mailing list
Subject: Re: LAM: MPI_Final and adding nodes
On Mon, 28 Apr 2003, Pak, Anne O wrote:
> [snipped]
> I am running into several problems:
>
> 1. I can't seem to run this entire MATLAB/MEX/MPI program more than once
> without it crashing MATLAB with the error:
>
----------------------------------------------------------------------------
> It seems that at least one rank invoked some MPI function after
> invoking MPI_FINALIZE. The only information that I can give is that
> it was PID 26874 on host galadriel.
>
> It was probably rank (unknown) on MPI_COMM_WORLD, but I can't say that for
> sure...
>
----------------------------------------------------------------------------
>
> So I am successful in getting this MATLAB/MEX.MPI program to run once,
> and it works, spews back my MATLAB outputs and seems to exit completely,
> returning me my MATLAB prompt. But when I try to run the program again
> from the MATLAB prompt, MATLAB crashes and I get the above error.
This sounds like matlab is loading your program into memory once, and then
running it multiple times within the same process space. The MPI standard
says that you can only call MPI_INIT/MPI_FINALIZE once within a process.
I don't have too much knowledge of MEX, but is there a way to ensure that
MPI_INIT is only called once, and that MPI_FINALIZE is also only called
once (i.e., when you terminate your program and/or the matlab session)?
> 2. The execution times when i run this MATLAB/MEX/MPI program do not
> seem to be constant at all. The very first time I invoke this program,
> the execution time is sometimes 10 times larger than subsequent
> invocations. Is this some sort of transient behavior?
It's hard to say without more specific information. I can't think of an
obvious reason for this other than transient behavior (e.g., the load on
the machines that you're running on).
> 3. It seems that as i use more nodes (i.e. spawn the parallelize the
> computation across more computers), the computation time of the
> parallelized calculation is WORSE!!! Does the execution time of
> MPI_Comm_spawn increase with more spawned processes?
No. However, not all problems will be faster if you throw more processors
at them. This is the general nature of parallel programming. Check out
this old post on the list:
http://www.lam-mpi.org/MailArchives/lam/msg04857.php
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/
|