|
There's a few sections here, and this is a somewhat lengthy page,
so here's a table of contents:
Short version
The user encrypts their password with the public key for the
system that they want to run on. The encrypted password is stored in
the user's AFS space. When a job is submitted, the user's password is
automatically (and securely) decrypted to obtain the corresponding
plain-text password, which is used to obtain an AFS token. The
token-obtaining process then goes to sleep, and wakes up one hour
before the AFS token expires. It then re-decrypts the password and
refreshes the token before returning to sleep. This process repeats
until the job completes.
Longer version
The system administrators for the batch system run the
mkpsrauthrkeys program to generate public and private
keys. The public key is made available for users to encrypt their AFS
passwords. The private key is kept secure and installed on the
compute nodes in the batch system.
Each user runs the pwstore command (or a custom shell
script -- see below) and types in their AFS password.
pwstore encrypts the password to the public key and saves
the file in the designated directory. The pwstore
command is typically wrapped in a shell script by system
administrators (a sample script is included in the distribution). The
shell script can do site-specific functionality such as create a
subdirectory under the user's $HOME to store the encrypted password,
set the appropriate AFS permissions on that directory, etc.
When the user submits a PBS job, the mother superior of the user's
job forks off a "shepherd" process that decrypts the user's password
using the secret key, and then obtains a new process authentication
group and AFS token. The PBS prologue is then run, followed by the
user's job. The shepherd process will wake itself up one hour before
the AFS token expires to renew the token. This continues until the
job completes. The epilogue is then run, and then the shepherd job is
killed off, and the AFS token is discarded.
Note that PBS is not the only system that PSR can be used with;
this scheme is general enough to work in most situations. PBS just
happens to be the only one that we provide a patch for.
Consequences of this design
Although this scheme is relatively safe from external hacking
(nothing is completely safe from external hacking),
it does mandate that the users must trust the system administrators --
the system administrators will be able to retrieve the user's
plain text AFS password. Good system administrators will not
abuse this system, but it is one possibility for abuse, and is
important enough to be explicitly mentioned here. Also, some sites
have policies that explicitly disallow the ability of anyone (even
system administrators) from knowing or obtaining other users' clear
text passwords.
As such, this scheme may or may not be compatable with your site's
local policies. Consult with your system administrators
and/or management before installing and using this system.
Use of PSR with multiple systems
PSR can easily be used with multiple systems. This can be done by
running pwstore with multiple public keys and storing the
encrypted result in different files. For example, a user can run
pwstore with public keys A, B, and
C. The encrypted passwords are then stored in
$HOME/.psr_authr/a.epass,
$HOME/.psr_authr/b.epass, and
$HOME/.psr_authr/c.epass, respectively.
When running on the A system, A compute nodes
with use A's secret key to decrypt
$HOME/.psr_authr/a.epass to obtain the plain text
password and obtain an AFS token. Simlarly, the B compute
nodes used B's secret key to decrypt
$HOME/.psr_authr/b.epass, etc. This is a particularly
useful model when the A, B, and C systems
are owned and operated by different groups.
When users change their password
Note that when users change their AFS passwords, they
must re-run the pwstore program to re-encrypt
their password for PSR. Indeed, if the user has jobs running when the
password is changed, the next time that the shepherd wakes up, it will
use the new password to refresh the token.
If a user fails to re-run the pwstore program, their
queued jobs will run with system:anyuser access to the
user's AFS space, which will likely result in the job failing (since
most jobs typically try to read/write to private AFS space). If jobs
are running when the user changes their password without re-running
pwstore, the next time that the shepherd wakes up, it
will fail to refresh the AFS token properly, which will likey cause
the job to fail for the same reason.
|