Hello:
I am using MPI_Comm_spawn to spawn processes on multiple nodes, with the aim
of parallelizing some calculations.
I am calling these MPI functions via a MEX function, which I in turn call
from MATLAB.
So, as a brief explanation of what i'm doing, i have a MATLAB program which
has these inputs which I wish to split up amongst my computer nodes. Each
computer nodes will do the *same* calculation, but only on the subportion of
inputs scattered to it. The results of each node's calculations will be fed
back to the master node for further processsing.
The calling program is in MATLAB, but all the nodes I have are only equipped
with C. I am passing my inputs to a MEX function, which then passes them to
a C subroutine that also contains MPI commands (such as MPI_scatter,
MPI_bcast, MPI_reduce, etc...) to distribute the inputs to the slave nodes.
I am running into several problems:
1. I can't seem to run this entire MATLAB/MEX/MPI program more than once
without it crashing MATLAB with the error:
----------------------------------------------------------------------------
-
It seems that at least one rank invoked some MPI function after
invoking MPI_FINALIZE. The only information that I can give is that
it was PID 26874 on host galadriel.
It was probably rank (unknown) on MPI_COMM_WORLD, but I can't say that for
sure...
----------------------------------------------------------------------------
-
So I am successful in getting this MATLAB/MEX.MPI program to run once, and
it works, spews back my MATLAB outputs and seems to exit completely,
returning me my MATLAB prompt. But when I try to run the program again from
the MATLAB prompt, MATLAB crashes and I get the above error.
2. The execution times when i run this MATLAB/MEX/MPI program do not seem
to be constant at all. The very first time I invoke this program, the
execution time is sometimes 10 times larger than subsequent invocations. Is
this some sort of transient behavior?
3. It seems that as i use more nodes (i.e. spawn the parallelize the
computation across more computers), the computation time of the parallelized
calculation is WORSE!!! Does the execution time of MPI_Comm_spawn increase
with more spawned processes?
Hope someone can shed some light on all these issues?
Thank you,
Anne
___________________________________________________
Anne Pak, L1-50
Building 153 2G8
1111 Lockheed Martin Way
Sunnyvale, CA 94089
(408) 742-4369 (W)
(408) 742-4697 (F)
|