LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2005-10-17 09:44:26


Sorry for the delay in replying -- I was traveling last week and fell
behind on mail.

The master/slave example in the LAM distribution shows the limits of
what it can do -- as I mentioned, it's unfortunately somewhat immature.

You might want to look at FT-MPI -- http://icl.cs.utk.edu/ftmpi/ --
they have mechanisms for losing processes in a reliable manner. This
technology will eventually be incorporated into Open MPI as well
(perhaps by mid next year? That's somewhat of a SWAG on the time
schedule)

On Oct 11, 2005, at 10:56 AM, Alastuey, Lucas wrote:

> I'm trying to implement a solver for Distributed Genetic Algorithms
> using local network, and this have to be fault tolerant. This solver
> has
> to be able to know when nodes are disconnected and keep running.
>
> How I know if a node is down?
>
> Any ideas.
>
>
> -----Original Message-----
> From: lam-bounces_at_[hidden] [mailto:lam-bounces_at_[hidden]] On
> Behalf
> Of Jeff Squyres
> Sent: jueves, 06 de octubre de 2005 10:00
> To: General LAM/MPI mailing list
> Subject: Re: LAM: lamshrink signal ?
>
> The application side of grow / shrink is fairly immature -- we stopped
> developing it quite a while ago (because of limitations in LAM and
> because we're doing better/more interesting things in Open MPI :-) --
> although they're not ready yet :-\ ). It's not really integrated well
> into the MPI framework -- you'd have to dip down into the LAM run-time
> to be able to intercept this and handle it properly. And even then,
> it's not clear that LAM's MPI layer would be able to handle the death
> of a process properly.
>
> What are you trying to do?
>
>
> On Oct 5, 2005, at 10:38 AM, Alastuey, Lucas wrote:
>
>> Hell0 list. i have a question. when a node was down send the SIGSHRINK
>
>> signal but who receive the signal the others node? who i read the
>> signal ?. for example i do a
>> MPI::COMM_WORLD.Send(buf,100,MPI::BYTE,dest,Tag)
>>
>> how know the node, the another node is down ? in the example case
>> just hang on .
>>
>> _______________________________________________
>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
>
> --
> {+} Jeff Squyres
> {+} The Open MPI Project
> {+} http://www.open-mpi.org/
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>

-- 
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/