LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Alastuey, Lucas (Lucas.Alastuey_at_[hidden])
Date: 2005-10-11 09:56:12


I'm trying to implement a solver for Distributed Genetic Algorithms
using local network, and this have to be fault tolerant. This solver has
to be able to know when nodes are disconnected and keep running.

How I know if a node is down?

Any ideas.

-----Original Message-----
From: lam-bounces_at_[hidden] [mailto:lam-bounces_at_[hidden]] On Behalf
Of Jeff Squyres
Sent: jueves, 06 de octubre de 2005 10:00
To: General LAM/MPI mailing list
Subject: Re: LAM: lamshrink signal ?

The application side of grow / shrink is fairly immature -- we stopped
developing it quite a while ago (because of limitations in LAM and
because we're doing better/more interesting things in Open MPI :-) --
although they're not ready yet :-\ ). It's not really integrated well
into the MPI framework -- you'd have to dip down into the LAM run-time
to be able to intercept this and handle it properly. And even then,
it's not clear that LAM's MPI layer would be able to handle the death
of a process properly.

What are you trying to do?

On Oct 5, 2005, at 10:38 AM, Alastuey, Lucas wrote:

> Hell0 list. i have a question. when a node was down send the SIGSHRINK

> signal but who receive the signal the others node? who i read the
> signal ?. for example i do a
> MPI::COMM_WORLD.Send(buf,100,MPI::BYTE,dest,Tag)
>
> how know the node, the another node is down ? in the example case
> just hang on .
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>

-- 
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/
_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/