LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Brian Barrett (brbarret_at_[hidden])
Date: 2004-10-13 11:51:19


That delay is normal for the current design of LAM/MPI. The signal is
only sent when the remote lam daemons notice that the lamd has been
killed. You can speed this up in two ways: first, make sure you are
running LAM in fault tolerant mode (the -x option to lamboot). Second,
adjust the compile-time option --with-lamd-hb=SEC to something lower
than the default of 120 seconds.

Brian

On Oct 12, 2004, at 10:07 PM, Vinod Kannan wrote:

> Hi,
> I am trying to utilize lam-mpi's ft capabilities,
> with regards to node failure.
> I was testing the receiving of SIGSHRINK at a
> process. I set up 4 processes ( in one case doing
> nothing but in a infinite while loop and in another
> case sending and receiving messages using Isend &
> Test), 2 on each node. All the processes register for
> SIGSHRINK via lam_ksignal(). Lam universe consisted of
> 2 nodes. I went in and did a tkill on one node. I
> noticed there was a substantial delay in receiving
> SIGSHRINK. The delay varied from a few seconds to a
> few minutes ( I crudely timed one to around 5
> minutes).
> I checked the node CPU load ( no other processes
> other than mine is running), memory ( no shortage). I
> am guessing network traffic should not be a factor
> since the local lamd is signalling processes in the
> same node ( since there is only 2 nodes). In any case
> the nodes are within a LAN with little or no traffic.
> I ran them at various times of the day (& night) with
> simliar delay. I am flushing out any print statements.
> Is this behaviour normal? What could be the reason?
> Is there anyway I can speed up the receiving( &
> capture) of the signal?
> Thanks for any help and advise.
> Newbie Lam-Mpi user
> Vinod
>
>
>
> __________________________________
> Do you Yahoo!?
> Take Yahoo! Mail with you! Get it on your mobile phone.
> http://mobile.yahoo.com/maildemo
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>

-- 
   Brian Barrett
   LAM/MPI developer and all around nice guy
   Have an LAM/MPI day: http://www.lam-mpi.org/