It's quite possible that you're doing nothing wrong.
I'm not sure how good Torque's CPU time measurer is -- it may be
inaccurate. I know that we had some problems with this in Open PBS
(although I confess to not remembering the exact issues). It's
possible that these problems have not been [yet] fixed in Torque...?
It's also possible that you really are using very little CPU time.
Don't forget the difference between CPU and user time -- it's quite
possible that your program is almost entirely spent in user time.
That's a very wishy-washy answer for you; sorry I don't have a better
one. :-\
Generally, however, I have found that wall clock time is really the
only reasonable way to benchmark parallel applications. Trying to
accumulate distributed CPU and user time can be misleading, inaccurate,
or not really reflect what happened in your application. FWIW, wall
clock, while certainly a compromise, is a known and well-understood
metric.
On Nov 16, 2004, at 12:49 PM, Konstantin Skaburskas wrote:
> Hi,
>
> I wanted to play with cpu time accounting for parallel jobs. I use
> Torque 1.1.0p4.
>
> The execution time of my test parallel job is about 33 sec. of wall
> clock time on my computer.
>
> When I run my test program as one-node "parallel" job from within the
> PBS
>
> #PBS -q batch
> #PBS -l nodes=1
> #PBS -N lam.tm.pbs
>
> lamboot
> mpirun -np 1 /home/konstan/progs/mpi/mpi_test_2
> lamhalt
>
> accounting log of PBS shows that the job used 0 sec. of cpu time -
> 'resources_used.cput=00:00:00', while used 33 sec. of wall clock time:
>
> 11/17/2004 19:29:31;E;19.doug1.ce.ut.ee;user=konstan group=konstan
> jobname=lam.tm.pbs queue=batch ctime=1100712538 qtime=1100712538
> etime=1100712538 start=1100712538 exec_host=doug1.ce.ut.ee/0
> Resource_List.neednodes=1 Resource_List.nodect=1
> Resource_List.nodes=1 Resource_List.walltime=01:00:00 session=14551
> end=1100712571 Exit_status=0 resources_used.cput=00:00:00
> resources_used.mem=3164kb resources_used.vmem=10240kb
> resources_used.walltime=00:00:33
>
> What am I doing wrong?
>
> Konstantin
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|