Hello,
After installation of OSCAR 4 on RH-EL-AS-3 cluster, one of my major mpi
program is not running right. Here is the detail, thanks in advance for
any help:
In short, the program will just sit there, waiting and waiting, but doing
nothing, since normally it should gives out a lot of outputs.
In detail, we have a 28 nodes cluster including master node, each have 2
CPUs
Originally, I was running LAM-6.5.9 on Redhat 7.2, using PGI
FORTRAN compiler and GNU C compiler. The command used to run is:
"mpirun -O -x CYANALIB c0,1,2,3,4,5,6,7,8,9,10,11,12 My_Program"
It ran fine, when run "gstat -a -1", I would see 6 nodes running at about
100% CPU time, since each had two copies running.
Now, I am using OSCAR 4(LAM-7.0.6) on RH-EL-AS-3 with all GNU compilers(C
and FORTRAN), I recompiled my program BTW. Now with the same command, it
runs, then just sits there, doing nothing. And from "gstat -a -1", it
only shows 6 nodes running at about 50% CPU time, which seems like only
one copy running on each node. The "mpitask" shows everything running.
Anyone's got any idea?
Regards
Chen
===========================================
Yu Chen
Howard Hughes Medical Institute
Chemistry Building, Rm 182
University of Maryland at Baltimore County
1000 Hilltop Circle
Baltimore, MD 21250
phone: (410)455-6347 (primary)
(410)455-2718 (secondary)
fax: (410)455-1174
email: chen_at_[hidden]
===========================================
|