LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Brian W. Barrett (brbarret_at_[hidden])
Date: 2003-05-31 10:31:03


On Tue, 27 May 2003, Phil Ehrens wrote:

> Hmm... I guess this really is going to be a major release.
>
> I am very interested in hearing about this too. Is he correct,
> was '-x' a no-op in previous releases? It made me feel awfully
> good that I was using it. Now I will be apprehensive about
> using it because it might do something ;^)

No - the -x option has done something in LAM for many, many years (the
furthest back I know personally is the 6.2b series). Keep in mind that
the -x option does not provide any additional fault tolerance for your MPI
application - only for the LAM run-time infrastructure. The MPI
application will still abort if a node fails (unless you follow the
guidelines presented in the README in <top lam src dir>/examples/fault/.

Hope this helps,

Brian

> Phil
>
> Jim Procter wrote:
> > I couldn't actually see any code using this option in the source of 6.5.9,
> > but there does seem to be some kind of polling in the 7.0.x beta. The problem
> > is that there is nothing mentioned about actually monitoring the 'heartbeat',
> > and any calls or status flags that could be used by a controlling process
> > (like a cron job) to deal with any failures.
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>

-- 
  Brian Barrett
  LAM/MPI developer and all around nice guy
  Have a LAM/MPI day: http://www.lam-mpi.org/