Howdy. I am an undergrad CS student at the University of Hawaii
charged with creating a cluster that is 'resistant' to the effects of
Murphy's law, or basically, a robust and versitile system. Currently
my setup is this: I have 23 machines (with 50 or so on the way)
running FreeBSD and lam 6.5.9. There is one master, which stores all
user data, and NIS information. This master also acts as a gateway to
the internet by means of a second NIC. The other 22 nodes are all NIS
clients, which use NFS to mount home directories from the master.
Now, on to the problem..
Right now the master is static, since it is the only machine with two
NICs and it is the only machine with NIS/user files information stored
on it. My goal is to modify this setup so the master can be any node
on the cluster, which can be re-assigned dynamically/automatically.
My question is, is there anything out there that might help me in this
endeavor? I want to avoid re-inventing the wheel as much as possible.
Right now my plan is to throw a router in front of the switch
networking the cluster, and plug it into the Internet, so it'd be a
simple cron job required to re-assign the master in terms of ther
internet. I also plan on using rsync to backup user data.
Ultimately the problem is when the master goes down while there are
MPI processes running. Does LAM provide some functionality to adapt
to this case? Would it be possible to somehow ensure that the process
does not die?
I realize this was fairly long winded... but eh.
Thanks,
Bill
|