LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeffrey Squyres (jsquyres_at_[hidden])
Date: 2005-07-20 17:48:36


In Jul 16, 2005, at 5:27 PM, Lei_at_ICS wrote:

> I am trying to construct a very simple prototype based on your
> suggestion #1.
> And I have another question now. In a normal MPI run, the LAM daemon
> network is started before mpirun, and mpirun will specify how many and
> which
> PEs to use by using, e.g., mpirun -np 3 a.out or mpirun n0-2 a.out.

Correct.

> In the following quoted design, how does the master spawn a bunch of
> slaves on the PEs that I specify? In
> MPI_COMM_SPAWN(command, argv, maxprocs, info, root, comm, intercomm,
> array_of_errcodes)
> there is way to specify the max number of procs, but if the master is
> not
> started by mpirun, wouldn't all maxprocs processes be spawned to the
> local PE?

I think you're confusing the issue here. See below.

> Is there a way to start a LAM daemon network among a list of IPs
> using MPI_Init(int *argc, char ***argv) from a sequential program like
> a.out?

Yes and no. LAM *must* have a universe before any MPI application will
run. So if you run a.out without a LAM universe, you'll get an error
message.

Hence, you must lamboot before you run any MPI application under LAM.
You can do this before you run matlab, or from your mex script (e.g.,
fork/exec a lamboot). Once you have a lamboot, then you can call
MPI_Init and start doing things like MPI_Comm_spawn. In addition, the
design that I gave below (I think you cut-n-pasted that from a message
from the LAM mailing list archives, right?) assumes that there is a LAM
universe running because it uses published names, etc. which only exist
if a LAM universe exists.

So, once you have a LAM universe, you can launch MPI jobs in one of
three ways:

1. "Singleton", where you just "./a.out" (where a.out invokes
MPI_Init). This will make a.out be an MPI process, and it will have an
MPI_COMM_WORLD size of 1.

2. With mpirun.

3. With MPI_Comm_spawn[_multiple].

So what I was suggesting with that design is that you will probably
lamboot before you run matlab (or you can make your mex script smart
enough to run lamboot itself), and then have a mex interface that calls
MPI_Init. This will give you a singleton MPI process, where you can
look for published names, etc. Then you can spawn a master or connect
to the existing master... etc.

The reason for this persistent master-based design is that once your
mex script completes, Matlab apparently unloads all the supporting C
libraries (including MPI). Hence, your script must invoke MPI_Finalize
before it returns to the user to terminate MPI, because it will be
unloaded (which will terminate MPI anyway -- but you should invoke
MPI_Finalize to do it gracefully). This is the whole motivation for
using MPI_Comm_spawn to launch a master that can survive when Matlab
unloads the mex script and supporting libraries.

Make sense?

> Hope my question makes sense. BTW, your design seems to be exactly
> what I wanted.
>
> Thanks,
>
> -Lei
>
>
> -------------------- quoted msg ------------------------
>
> Actually, my ordering wasn't exactly right. Try this:
>
>> - your matlab script launches
>> - it calls MPI_Init
>> - check for a published name
>> - if the published name does not exist
>> - spawn a master (i.e., a new, independant process)
>> - the master spawns a bunch of slaves to do the work
>> - the master publishes a name
>> - if the published name does exist
>> - MPI_Comm_connect to the master
>> - the matlab script sends a bunch of work to the master
>> - the master farms it out to all the slaves
>> - the slaves do all the work and eventually send the result(s) to the
>> master
>> - the master sends the result(s) to the matlab script
>> - the matlab script disconnects from the master
>> - the matlab script finishes
>
> --
> {+} Jeff Squyres
>
>
>
>
> Jeff Squyres wrote:
>
>> On Jul 13, 2005, at 3:16 AM, Lei_at_ICS wrote:
>>
>>
>>
>>> I am given a Matlab+C program for me to parallelize. What I need to
>>> do
>>> is to parallelize a C routine that is called by another C routine,
>>> which in
>>> turn is called from Matlab. There are large and complicated C data
>>> structures that are passed in and out the C routine being
>>> parallelized.
>>>
>>> My question is: is there a way to spawn MPI processes from a non-MPI
>>> process, and at the same time pass in and out C data structures
>>> between
>>> the parent sequential and the child MPI processes.
>>>
>>>
>>
>> So that's 2 questions:
>>
>> 1. can a non-MPI process spawn MPI processes: sort of. However, it's
>> probably easiest if your non-MPI process simply calls MPI_INIT and
>> becomes an MPI process (i.e., you don't have to launch it via mpirun
>> --
>> if it calls MPI_INIT, it will simply get an MPI_COMM_WORLD size of 1,
>> and then call MPI_COMM_SPAWN from there to launch more MPI processes).
>>
>> Other people have done this with Matlab before (indeed, there is a
>> package out there with MPI bindings for Matlab MEX), and there are
>> definitely some issues that need to be thought out first. Search the
>> LAM mailing list archives -- this stuff has been discussed extensively
>> before.
>>
>> 2. Can you pass C structures from the parent to the child: if you
>> follow the suggestion in #1, there's nothing to pass -- Matlab itself
>> becomes the parent MPI process; then you use standard MPI send/receive
>> semantics to pass data from the Matlab script (or C routine) to the
>> spawned processes. If you use a different mechanism, then you can use
>> traditional Unix IPC mechanisms to pass the data (pipes, files, local
>> sockets, shared memory, etc.).
>>
>> Hope that helps.
>>
>>
>>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/