LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: jerome lefevre (jlefevre_at_[hidden])
Date: 2005-06-30 05:51:49


Hi,

Ok, i send you a snapshoot from my shell, but i confirm, there is no
pbs.conf.
I will read more info about PBS and perhaps reinstall PBS.

Good day
Jérôme

 *************************************************************
[root_at_editr root]# updatedb
[root_at_editr root]# locate pbs.conf
[root_at_editr root]#
[root_at_editr root]# cexec 'updatedb'
************************* oscar_cluster *************************
--------- node1---------
Warning: No xauth data; using fake authentication data for X11 forwarding.
--------- node2---------
--------- node3---------
Warning: No xauth data; using fake authentication data for X11 forwarding.
--------- node4---------
Warning: No xauth data; using fake authentication data for X11 forwarding.
[root_at_editr root]# cexec 'locate pbs.conf'
************************* oscar_cluster *************************
--------- node1---------
Warning: No xauth data; using fake authentication data for X11 forwarding.
--------- node2---------
--------- node3---------
Warning: No xauth data; using fake authentication data for X11 forwarding.
--------- node4---------
Warning: No xauth data; using fake authentication data for X11 forwarding.
[root_at_editr root]#

[umr65_at_editr umr65]$ lamboot -v

LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University

n-1<29068> ssi:boot:base:linear: booting n0 (node1.cluster.ird.nc)
n-1<29068> ssi:boot:base:linear: booting n1 (node2.cluster.ird.nc)
n-1<29068> ssi:boot:base:linear: booting n2 (node3.cluster.ird.nc)
n-1<29068> ssi:boot:base:linear: booting n3 (node4.cluster.ird.nc)
n-1<29068> ssi:boot:base:linear: booting n4 (editr.cluster.ird.nc)
n-1<29068> ssi:boot:base:linear: finished
[umr65_at_editr umr65]$ lamhalt

LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University

[umr65_at_editr umr65]$ qsub -lnodes=2 -I
qsub: waiting for job 47.editr.cluster.ird.nc to start
Do you wish to terminate the job and exit (y|[n])? y
Job 47.editr.cluster.ird.nc is being deleted
[umr65_at_editr umr65]$ qstat -f
Job Id: 47.editr.cluster.ird.nc
    Job_Name = STDIN
    Job_Owner = umr65_at_[hidden]
    job_state = Q
    queue = workq
    server = editr.cluster.ird.nc
    Checkpoint = u
    ctime = Thu Jun 30 17:04:02 2005
    Error_Path = editr.cluster.ird.nc:/home/umr65/STDIN.e47
    exec_host = node1.cluster.ird.nc/0+editr.cluster.ird.nc/0
    Hold_Types = n
    interactive = True
    Join_Path = n
    Keep_Files = n
    Mail_Points = a
    mtime = Thu Jun 30 17:04:15 2005
    Output_Path = editr.cluster.ird.nc:/home/umr65/STDIN.o47
    Priority = 0
    qtime = Thu Jun 30 17:04:02 2005
    Rerunable = True
    Resource_List.cput = 10000:00:00
    Resource_List.ncpus = 1
    Resource_List.nodect = 2
    Resource_List.nodes = 2
    Resource_List.walltime = 10000:00:00
    Variable_List = PBS_O_HOME=/home/umr65,PBS_O_LANG=fr_FR.UTF-8,
        PBS_O_LOGNAME=umr65,

PBS_O_PATH=/opt/intel_fc_81/bin:/usr/pgi/linux86/5.2/bin:/opt/intel_fc

_81/bin:/usr/pgi/linux86/5.2/bin:/usr/kerberos/bin:/opt/lam-7.1_pgi/bin

:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/opt/env-switcher/bin:/opt

/kernel_picker/bin:/opt/pvm3/lib:/opt/pvm3/lib/LINUX:/opt/pvm3/bin/LINU

X:/opt/c3-4/:/opt/pbs/bin:/opt/pbs/lib/xpbs/bin:/opt/ferret_V58/bin:/op

t/netcdf-3.6_pgi/bin:/opt/NCO_300/bin:/home/umr65/bin:./:/opt/ferret_V5
        8/bin:/opt/netcdf-3.6_pgi/bin:/opt/NCO_300/bin,
        PBS_O_MAIL=/var/spool/mail/umr65,PBS_O_SHELL=/bin/bash,
        PBS_O_HOST=editr.cluster.ird.nc,PBS_O_WORKDIR=/home/umr65,
        PBS_O_QUEUE=workq
    comment = Job started on Thu Jun 30 at 17:04
    etime = Thu Jun 30 17:04:02 2005
 [umr65_at_editr umr65]$
 ********************************************************************

> The PBS config files only appear on one machine (the head node?); make
> sure to check for them on the relevant node. However, the output from
> qmgr and qstat means that PBS is configured somehow -- I'm guessing
> that you're looking for the config files in the wrong place.
>
> The big question is this:
>
> > > But, if i ask a PBS job in interactive mode, like this :
> > >
> > > [umr65_at_editr SCRATCH]$ qsub -lnodes=2 -I
> > > qsub: waiting for job 43.editr.cluster.ird.nc to start
> > >
> > > After a long time, PBS still waiting ... If i check with "qstat -f",
> > > "Resource_List.ncpus "is always equal to 1. However i asked 2 nodes !
> > > What
> > > is wrong ? Do you want other log, output ?
>
> Do you ever get a job shell? From this text, it's not clear if your
> prior tests were with this same qsub line or a different command line
> (because you said that your prior tests *ran*, but unexpectedly only
> with one node).
>
> You never answered my questions about what you meant with your problems
> with lamd's possibly remaining on a node after the job completed. So
> I'm assuming that I either misunderstood the question, or it's somehow
> no longer a problem.
>
> So I hand this thread off to the OSCAR list and someone who can answer
> PBS questions...
>
> Once you can reliably get a PBS job with the right number of nodes, try
> lamboot again; I'm guessing that LAM will do the Right Thing (i.e.,
> running "lamnodes" after lamboot will show all the nodes in your job).
> If it doesn't, post back to the LAM list, but please include a direct
> cut-n-paste from your shell output showing all the steps you took that
> results in lamboot not using all the nodes in your job. Thanks!
>
> --
> {+} Jeff Squyres
> {+} jsquyres_at_[hidden]
> {+} http://www.lam-mpi.org/
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/