LAM/MPI logo

LAM/MPI Development Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2005-11-22 15:37:28


On Nov 18, 2005, at 9:45 AM, ~{Ea6{Cw~} wrote:

> I've successfully debuged and installed krb4-devel, PSR and
> OpenPBS-2.3.16 on my server (redhat 7.3).
> After installation, I made pub key and sec key using "mkpsrauthrkeys"
> and generated my epass key using "pbspwstore" into $HOME/.psr_authr/
> directory. But I cannot get my token when I submited interactive jobs.
> There are some questions puzzles me:

I have to admit that it's been many many years since we developed and
used this software -- we are no longer at an AFS site, so I have no
facilities to test and debug anymore. :-(

> 1.How to ensure that evething is working, to submit an interactive job
> and then run "tokens"?

Yes, this is a good way.

> 2.Is it nesscary submit interactive jobs? Why?

No. This worked for non-interactive jobs as well.

> As an average user, after I login, I got a token simultaneously. But
> when I run "qsub -I jobname",
> it reports :
> [hpcsvr02] /afs/ihep.ac.cn/users/p/pemxz > qsub -I jobi
> qsub: waiting for job 12.hpcsvr02.ihep.ac.cn to start
> qsub: job 12.hpcsvr02.ihep.ac.cn ready
>
> -bash: [: -: integer expression expected
> -bash: [: -ge: unary operator expected

Yikes; that doesn't sound right. Can you tell where this is coming
from? If I recall correctly, PSR was a compiled executable -- this
looks like a shell script error (e.g., in your prologue, or your shell
startup files).

> then I run "tokens", returns nothing:
>
> [hpcsvr02] /afs/ihep.ac.cn/users/p/pemxz > tokens
> Tokens held by the Cache Manager:
>
> --End of list--

I'm *guessing* that if something goes wrong in the prologue or your
shell startup files, then PSR won't be run and you won't get a token.

> 3. I tried to submit an non-interactive job. But the status of the job
> finally shows "E". Then I found the error reports
> by "qstat -f":
> ....
> sched_hint = Post job file processing error; job
> 15.hpcsvr02.ihep.ac.cn on host hpcsvr02.ihep.ac.cn/0
>
> Unable to copy file 15.hpcsvr02.OU to
> hpcsvr02.ihep.ac.cn:/afs/ihep.ac.cn/users/p/pemxz/jobs.o15
>>>> error from copy
> /bin/cp: cannot create regular file
> `/afs/ihep.ac.cn/users/p/pemxz/jobs.o15': Permission denied

This makes perfect sense -- if you have no token, you'll get all these
AFS errors.

> I tried to start PBS/Torque as root AFTER got the root's token, the
> problem pass away. But, should I always refresh geting root's token ?
> It sounds like an irony.

root should not have an AFS token, right? I'm guessing you can copy to
root's $HOME because it's not on AFS.

-- 
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/