Jeff:
First of all, thank you for your detailed reply! I have a question about
Open MPI.
> You can also do stuff like allow MPI_COMM_ACCEPT to block in a thread
> while your app goes off and does other stuff. This allows so fairly
> interesting scenarios.
I thought I was already doing this in LAM. Is this only possible in Open
MPI?
One other question:
Imagine my scenario from before. My system consist of multiple Services,
all of which may be connected to other Services through MPI. When each
Service starts up, I launch a thread to start trying to do an
MPI_COMM_ACCEPT so connections can be made with other Services. However,
after the accept thread is running and I try and actually *connect* with
another Service (with the main thread), something doesn't work (I get a lamd
kernel error). I take it this is because I am working with threads. In
this scenario, is Open MPI required? I am not currently using
MPI_Init_thread, maybe I should look into this. Any advice in this
endeavor?
Regards!
Andy Tarpley
----- Original Message -----
From: "Jeff Squyres" <jsquyres_at_[hidden]>
To: "General LAM/MPI mailing list" <lam_at_[hidden]>
Sent: Tuesday, October 19, 2004 5:01 PM
Subject: Re: LAM: Question about migrating from PVM to MPI.
> On Oct 18, 2004, at 9:07 AM, atarpley wrote:
>
>> I am trying to convert an existing system from PVM to MPI. I have a few
>> questions and statements and I would appreciate any comments on them.
>>
>> The system requires dynamic process management. I understand MPI2 has
>> this
>> functionality. The system basically has a Service Configurator that
>> launches multiple Services. Each Service can be connected to any
>> number of
>> other Services. With MPI, there is a MPI_Comm_spawn method. I have
>> identified this as the best way to launch external executables with MPI.
>>
>> Question 1: Does the application spawned by MPI_Comm_spawn have to be an
>> MPI
>> application itself? I have read that it does have to, but I have
>> successfully
>> launch Unix apps like xcalc, which I know is not an MPI app.
>
> The MPI standard says yes, they have to be MPI applications. LAM/MPI, for
> example, expects them to be MPI applications and will indefinitely hang
> the parent(s) until a spawned non-MPI app exits, which will then cause a
> run-time error in the parent(s).
>
>> Once Services are launched from the Service Configurator, they must
>> establish communication with each other. I have identified the socket
>> like
>> behavior of MPI as suitable for this (MPI_Open_port, MPI_Comm_connect,
>> etc).
>
> Right.
>
>> Question 2: My system needs to be able to handle dynamic Service
>> disconnection
>> commands. If I wanted to disconnect Service A from B and connect A to C,
>> it
>> seems that Id have to instruct A to disconnect from B, tell C to call
>> MPI_Comm_accept, then tell A to connect to C. Is there anyway to do this
>> in a
>> non-blocking manner?
>
> Unfortunately, not without threads. The MPI standard defines them as
> blocking. LAM, for example, does not have a timeout period -- accept will
> hang indefinitely waiting for someone to connect.
>
>> What if I wanted to connect multiple Services to Service
>> C? Would Service C have to do multiple blocking MPI_Comm_accept calls?
>> Wouldn't this create a a logjam with the other Services trying to connect
>> to
>> Service C?
>
> Yes. Just like sockets. There's really no other way to do it -- if you
> want to have N entities connect to 1 entity, then you have to serialize
> somewhere.
>
>> Question 3: My system requires a high fault tolerance. With permanent
>> MPI
>> pipelines being open between Services and between Services and the
>> Service
>> Configurator, isnt there a significant chance for an error in
>> communication
>> to bring down the entire MPI system?
>
> Yes. You might want to examine the MPI definition of "connected" and its
> relation to MPI_FINALIZE and MPI_ABORT -- these things are discussed in
> the MPI-2 standard, the dynamic processes chapter. The end result is that
> MPI allows an error to take down *all* connected processes, but allows an
> implementation to do something better if it wants.
>
>> Is there anyway to recover smoothly from
>> a seg fault or fatal error in communication? As a test, I purposely
>> caused a
>> seg fault in a Service while it was connected to another Service with MPI
>> and
>> it brought down both Services. Any way around this?
>
> Don't have seg faults. ;-)
>
> Right now (i.e., the current state of MPI implementations), not really.
> :-(
>
> FT is an area still largely unexplored in MPI. Some people have done some
> work in this area -- e.g., FT-MPI has done some interesting stuff, and
> we'll be extending that work, LAM's checkpoint/restart stuff, and a
> variety of other FT things (like data fault tolerance, run-time exception
> tolerance, etc.) in Open MPI.
>
>> Question 4: Are there any caveats to making an MPI shared object? I
>> would
>> like all of the Services to dynamically use a shared Dispatcher object
>> that
>> uses MPI as the message passing paradigm.
>
> I assume you mean a shared library that is loaded and unloaded from a
> process at run-time?
>
> No, that should work fine. However, be aware that the MPI standard says
> that MPI_INIT/MPI_INIT_THREAD and MPI_FINALIZE are only allowed to be
> called once per process. That being said, you can *probably* get around
> that error if you *compleletely* unload the MPI portion of your app from
> the process -- i.e., there's no state left to tell MPI (upon a later
> re-load) that it was previously MPI_INIT'ed.
>
>> Question 5: Lastly, I dont even know if MPI is right for this kind of
>> system.
>> The Service Configurator and Services will always be using just one MPI
>> process each. So in effect, the only thing I am using MPI for would be
>> the
>> message passing between Services and between Services and the Service
>> Configurator. Is this proper use of MPI?
>
> Sure. MPI is a generic message passing system, and most MPI
> implementations define an MPI process as an operating system process.
> However, not many of them support concurrency in threads -- Open MPI does.
> Open MPI supports MPI_THREAD_MULTIPLE, meaning that even though Open MPI
> defines an MPI process as an OS process, you can still use multiple
> threads concurrently and even have one thread send a message to another
> thread. You can also do stuff like allow MPI_COMM_ACCEPT to block in a
> thread while your app goes off and does other stuff. This allows so
> fairly interesting scenarios.
>
> However, you do have to obey MPI semantics. You need to ask yourself what
> the benefits and drawbacks are. Are high bandwidth and low latency
> requirements? Or is writing this with sockets easier (and therefore
> always using TCP) because you'll potentially have less constraints
> (particularly in the FT arena)?
>
> Sorry that some of this has sounded like an advertisement for Open MPI;
> I'm actually quite excited about it because we're doing things in an MPI
> implementation that should have been done a long time ago and will enable
> things like you're trying to do.
>
> --
> {+} Jeff Squyres
> {+} jsquyres_at_[hidden]
> {+} http://www.lam-mpi.org/
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
|