LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Robin Humble (rjh_at_[hidden])
Date: 2004-01-05 13:34:31


On Mon, Jan 05, 2004 at 06:50:04PM +0100, jess michelsen wrote:
>Just had the switches firmware upgraded. Lamboot still boots very slow
>(maybe). For 84 P4/2.4 GHz nodes being booted over a gigabit LAN, it
>takes approximately 20 minutes. Is this normal or much too slow?

if you are using the ssh boot mechanism then 20mins is a bit slow, but
not completely crazy. We've seen maybe 10 mins to boot 100+ nodes.
rsh is much faster than ssh, as is using the '-b' flag to lamboot if
your environment is homogeneous. eg.
  setenv LAMRSH rsh
  lamboot -b $PBS_NODEFILE
  ...

It's also possible that you have nameserver (/etc/resolv.conf,
/etc/nsswitch.conf, /etc/hosts) problems which can take maybe 30s per
node to timeout, and then fallback to looking up machine names/numbers
in another way. /etc/hosts.{allow,deny} checks may be causing more
name/number lookups too.

OTOH the tm boot mechanism should be nearly instantaneous. eg.
  lamboot
  cd $PBS_O_WORKDIR
  mpirun C ...
  lamhalt

cheers,
robin