On Mon, Jan 05, 2004 at 06:50:04PM +0100, jess michelsen wrote:
>Just had the switches firmware upgraded. Lamboot still boots very slow
>(maybe). For 84 P4/2.4 GHz nodes being booted over a gigabit LAN, it
>takes approximately 20 minutes. Is this normal or much too slow?
if you are using the ssh boot mechanism then 20mins is a bit slow, but
not completely crazy. We've seen maybe 10 mins to boot 100+ nodes.
rsh is much faster than ssh, as is using the '-b' flag to lamboot if
your environment is homogeneous. eg.
setenv LAMRSH rsh
lamboot -b $PBS_NODEFILE
...
It's also possible that you have nameserver (/etc/resolv.conf,
/etc/nsswitch.conf, /etc/hosts) problems which can take maybe 30s per
node to timeout, and then fallback to looking up machine names/numbers
in another way. /etc/hosts.{allow,deny} checks may be causing more
name/number lookups too.
OTOH the tm boot mechanism should be nearly instantaneous. eg.
lamboot
cd $PBS_O_WORKDIR
mpirun C ...
lamhalt
cheers,
robin
|