LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Yu Chen (chen_at_[hidden])
Date: 2005-02-18 10:20:49


Hi, I have been struggling about weeks. Still no clue. Hope somebody can
help me out :-(

I am using LAM-7.1.2.b13, compiled with Intel-8.1 compiler(both CC=icc,
CXX=icpc, FC=ifort) withouth problem and installed fine.

We have a 20 nodes cluster, all have 2 AMD Athlon MP1900+ cpus, running
RHEL-AS-3. Same LAM are installed on each of nodes. Data and our program
are nfs auto-mounted on each nodes.

After I start the program, it reads some data in, then stoped right before
the real calculation begins. No errors/warnings, just sit.

Strange thing is then I run "mpitask", it gives out this:
===================================================================
TASK (G/L) FUNCTION PEER|ROOT TAG COMM COUNT
DATATYPE
0 LAM_MPI_Fortran_pr <running>
0 LAM_MPI_Fortran_pr <running>
0 LAM_MPI_Fortran_pr <running>
0 LAM_MPI_Fortran_pr <running>
0 LAM_MPI_Fortran_pr <running>
0 LAM_MPI_Fortran_pr <running>
0 LAM_MPI_Fortran_pr <running>
0 LAM_MPI_Fortran_pr <running>
===========================================
Which I think is not right, but don't know what's wrong, also when I
logged in to one of the nodes being used, the "ps" shows two process of
the program I started.

But in "top" command as showing:
--------------------------------------------------------------------
CPU states: cpu user nice system irq softirq iowait idle
            total 98.6% 0.0% 1.4% 0.0% 0.0% 0.0% 100.0%
            cpu00 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 100.0%
            cpu01 98.6% 0.0% 1.4% 0.0% 0.0% 0.0% 0.0%
                    ^^^^^
                    ^^^^^
Mem: 1026712k av, 48064k used, 978648k free, 0k shrd, 6204k
buff
         31884k active, 1940k inactive
Swap: 2040244k av, 0k used, 2040244k free 16740k
cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
5896 chen 25 0 3452 3452 1660 R 99.8 0.3 9:43 1 cyanaexe.intel-
....................... ^^^^^
                                            ^^^^^
------------------------------------------------------------------------

There are only one process actually running (cyanaexe.intel-), and %CPU
should be around 50% for the program, right? since only one cpu being used
as showen at the top, seems they are not communicating.

I am totally clueless on this, any help are highly appreciated!!

Best regards,
Chen

===========================================
Yu Chen
Howard Hughes Medical Institute
Chemistry Building, Rm 182
University of Maryland at Baltimore County
1000 Hilltop Circle
Baltimore, MD 21250

phone: (410)455-6347 (primary)
         (410)455-2718 (secondary)
fax: (410)455-1174
email: chen_at_[hidden]
===========================================