Thanks Jeff!
As I often do, I shot first and aimed later. A little more complete
reading of the specs would have helped. Thanks for the gracious rescue.
It will help me in my ongoing MPI endeavors as well.
Good luck with Open MPI!
/jr
---
Jeff Squyres wrote:
> On Aug 29, 2005, at 9:04 PM, John Robinson wrote:
>
>
>>>[snipped]
>>
>>I agree with what you propose about error codes. All I am reporting is
>>what I have observed. If I had designed it, I would have done it much
>>more along the lines you suggest.
>>
>>Note that I turn on exceptions, which is only helpful in a C++ context.
>> In C++:
>>
>> MPI::COMM_WORLD.Set_errhandler( MPI::ERRORS_THROW_EXCEPTIONS );
>
>
> You can do something similar in C and Fortran, except use the
> MPI_ERRORS_RETURN error handler. Check out the MPI-1 specification for
> a full description of its error handling capabilities (and
> limitations). MPI::ERRORS_THROW_EXCEPTIONS was added in MPI-2 with the
> C++ bindings because it's a natural error mechanism that is available
> in C++ that is not available in C or Fortran. So yes, it's only
> available in C++, but it only makes sense in C++.
>
>
>>That is not a very good answer. This situation leds me to wonder if
>>the
>>accept/connect and publish/lookup were added at the last minute to the
>>standard. They do not seem to be well thought through, IMHO.
>
>
> No, there was actually a lot of debate for many months about the whole
> dynamic chapter. :-)
>
> The error mechanisms for these functions are quite consistent with all
> the other MPI functions.
>
>
>>>This backgroud process should be part of the MPI daemon, but not
>>>applications, right?
>>>The daemon has all the knowledge of which processes published what
>>>names, and
>>>which processes died, and it is a background process.
>>
>>I meant writing a daemon that could poke into the MPI environment to
>>figure out if something is stopped or hung. If some MPI states cause
>>the program to abort, you could still monitor using a shell script that
>>decodes the completion status of the programs it invokes, and write a
>>lot of little programs for each probe (MPI present, name published,
>>test
>>connection works, etc.). But this gets pretty messy.
>
>
> Agreed. This is a function of LAM's implementation. We had Grand
> Plans to make this more fine-grained and more useful for things like
> this, but then we started working on Open MPI (see a mail from me about
> this earlier this morning -- starting working on Open MPI meant
> stopping working on many things in LAM, with the intent that we would
> [eventually] do them in Open MPI instead).
>
> We do plan to have such fine-grained tools in Open MPI (e.g., command
> line tools to unpublish a name, kill a specific process and/or parallel
> job, etc.), but they will not be included in Open MPI v1.0.
>
>
>>I am not aware of a way to hook a user program to the MPI daemon (lamd
>>in the case of LAM), but I like your suggestion. Another approach
>>would
>>be a more elaborate utility that could check various MPI states (think,
>>an expanded mpitask), and return completion status to its caller.
>
>
> I replied something about this in a mail a few minutes ago -- you might
> want to combine MPI semantics with a few semantics of your own (e.g.,
> lockfiles or somesuch) to know when processes are there and/or dead,
> etc. It would take some thought, but you should be able to produce a
> reasonable-enough (although probably not perfect) system to get it
> right 99.9% of the time.
>
>
>>Not_a_MPI_designer,_just_a_user_ly y'rs
>
>
> We like input from everyone -- even criticism! :-)
>
|