LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2006-02-15 17:07:11


This is a bit off-topic for this list, but: you might want to have a
look at some of the parallel cluster tools out there (e.g., the C3
tools). There are some nice toolsets that do similar functionality
(and more).

On Feb 15, 2006, at 12:00 PM, Eric Adint wrote:

> I have been using an OS-x 64 Node cluster and i have developed an
> admin script that does allot of my management for me, i thought i
> would post it here for others if they are interested in it.
>
>
> #!/bin/tcsh
>
> if (${1} == "") then
> echo 'This is the primary admin script for the cluster, you
> must be root and have '
> echo 'admin privelegs and a ~/pss file to run it. the
> following comands are available '
> echo
> echo 'cladmin shutdown <start> <end> -- shuts down the
> indicated cluster nodes'
> echo 'most likely you really want restart '
> echo
> echo 'cladmin restart <start> <end> -- this shuts down and
> restarts a range of nodes '
> echo
> echo '----- do not use shutdown and restart on node 1 under
> any circumstatnces ------'
> echo
> echo 'cladmin stat <start> <end> <procname> -- this
> performas a stat on the cluster nodes '
> echo ' it gives all the procceses ofr just the <procname>'
> echo
> echo 'cladmin st <start> <end> -- this is joes short
> version of stat '
> echo
> echo 'cladmin killam <start> <end> -- this is an attempt to
> fix the lam system it may or may not work '
> echo
> echo
> echo 'cladmin update <start> <end> "<Package name and
> location>" -- this wil install apple packages. '
> echo
>
> endif
>
>
>
> if (${1} == "shutdown") then
> set num=$2
> set end=$3
> while ( $num )
> echo -n "felcl${num}:"
> ssh felcl${num} sudo shutdown -h now &
> @ num++
> if ($num >$end) then
> exit (0)
> endif
> end
>
> endif
>
> if (${1} == "restart") then
> set num=$2
> set end=$3
> while ( $num )
> echo -n "felcl${num}:"
> ssh felcl${num} sudo shutdown -r now &
> @ num++
> sleep 10
> if ($num >$end) then
> exit (0)
> endif
> end
>
> endif
>
>
>
> if (${1} == "stat") then
> set num=$2
> set end=$3
> while ( $num )
> echo "Jobs running on the following
> computer felcl${num}: "
> if ($4 == "") ssh felcl${num} ps -arcux |
> grep -v "root \|nobody \|administ "
> if ($4 != "") ssh felcl${num} ps -arcux |
> grep -v "root \|nobody \|administ " | grep $4
>
> @ num++
> if ($num >$end) then
> exit (0)
> endif
> end
>
> endif
> if (${1} == "statp") then
> set num=$2
> set end=$3
> while ( $num )
> echo "Jobs running on the following
> computer felcl${num}: "
> ssh felcl${num} ps -aucx | grep LWSN
> @ num++
> if ($num >$end) then
> exit (0)
> endif
> end
>
> endif
>
>
> if (${1} == "st") then
> set num=$2
> set end=$3
> echo "Jobs running on the master node felcl${num}: "
> ssh felcl${num} ps -arcux | grep -v "root \|
> nobody \|administ "
> echo "Load level on each node:"
> while ( $num )
> echo -n "felcl${num}:"
> ssh felcl${num} uptime
> @ num++
> if ($num >$end) then
> exit (0)
> endif
> end
> endif
>
> if (${1} == "killlam") then
> set num=$2
> set end=$3
> while ( $num )
> echo -n "felcl${num}:"
> ssh felcl${num} sudo killall lamd
> ssh felcl${num} sudo killall -HUP xinetd
> ssh felcl${num} sudo killall -HUP sshd
> @ num++
> if ($num >$end) then
> exit (0)
> endif
> end
>
> endif
>
> if (${1} == "update") then
> set num=$2
> set end=$3
> while ( $num )
> echo -n "felcl${num}:"
> ssh felcl${num} sudo installer -verbose -pkg $4 -target /
> @ num++
> if ($num >$end) then
> exit (0)
> endif
> end
> endif
>
> if (${1} == "time") then
> set num=$2
> set end=$3
> while ( $num )
> echo -n "felcl${num}:"
> ssh felcl${num} sudo timed
> @ num++
> if ($num >$end) then
> exit (0)
> endif
> end
>
> endif
>
> if (${1} == "timecheck") then
> set num=$2
> set end=$3
> while ( $num )
> echo -n "felcl${num}:"
> ssh felcl${num} date&
> @ num++
> if ($num >$end) then
> exit (0)
> endif
> end
>
> endif
>
> if (${1} == "timekill") then
> set num=$2
> set end=$3
> while ( $num )
> echo -n "felcl${num}:"
> ssh felcl${num} sudo killall timed&
> @ num++
> if ($num >$end) then
> exit (0)
> endif
> end
>
> endif
>
> if (${1} == "timeset") then
> set num=$2
> set end=$3
> while ( $num )
> echo -n "felcl${num}:"
> ssh felcl${num} sudo timed -F felcl1&
> @ num++
> if ($num >$end) then
> exit (0)
> endif
> end
>
> endif
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/

-- 
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/