LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Yu Chen (chen_at_[hidden])
Date: 2005-01-13 13:42:19


Thanks Jeff for your reply, you are always so helpful.

> It's hard to say without more detail about your application; this could
> simply be the communication pattern of your application, that it causes
> blocking and makes processes wait for message passing to complete, etc.

But that program worked in provious setup, and it never got changed (only
difference is the different FORTRAN compiler, PGI vs GNU)

>
> Which RPI were you using in 6.5.9? I ask because LAM could only have one RPI
> compiled into it back in the 6.x series; only in the 7.x series did we debut
> the ability to choose your RPI at run-time.

I was using "usysv" on 6.5.9

> I'm guessing that you should be defaulting to usysv in 7.0.6, which, since it
> uses shared memory for messages on the same node, *may* account for speed
> differences between your 6.x and 7.x runs (e.g., if you were using the tcp
> RPI in the 6.x series) and therefore expose timing problems in your c

I used all default in 7.0.6 in OSCAR, so should be usysv too.

>
> The usysv RPI uses spin locks for on-node communication, so it should spin
> (and consume all the CPU) when it's waiting for on-node communication. But
> if you're blocking waiting for off-node communication, you won't see this
> spinning behavior.

> Can you attach a debugger to any of the processes and see what they are
> doing?
>

I really don't know how to do it, could you help me with this.

And I forgot to mention, I succesfully run the following hello-world.c
program:

++++++++++++++++++++++++++++++++++++++++++++++++++++
#include <stdio.h>
#include "mpi.h"

int
main(int argc, char **argv) {

   int rank;
   char msg[20];

   MPI_Init(&argc, &argv);
   MPI_Comm_rank(MPI_COMM_WORLD, &rank);

   if (rank==0) {
     printf("I am the master. I am sending the message.\n\n");
     strcpy(msg,"Hello World!");
     MPI_Bcast(msg, 13, MPI_CHAR, rank, MPI_COMM_WORLD);
   } else {
     MPI_Bcast(msg, 13, MPI_CHAR, 0, MPI_COMM_WORLD);
     printf("I am the slave. I am receiving the message.\n");
     printf("The message is: %s\n", msg);
   }

   MPI_Finalize();
}
+++++++++++++++++++++++++++++++++++++++++++++++++++++

Cheers,
Chen

>
>
> On Jan 13, 2005, at 11:36 AM, Yu Chen wrote:
>
>> Hello,
>>
>> After installation of OSCAR 4 on RH-EL-AS-3 cluster, one of my major mpi
>> program is not running right. Here is the detail, thanks in advance for any
>> help:
>>
>> In short, the program will just sit there, waiting and waiting, but doing
>> nothing, since normally it should gives out a lot of outputs.
>>
>> In detail, we have a 28 nodes cluster including master node, each have 2
>> CPUs
>>
>> Originally, I was running LAM-6.5.9 on Redhat 7.2, using PGI FORTRAN
>> compiler and GNU C compiler. The command used to run is:
>> "mpirun -O -x CYANALIB c0,1,2,3,4,5,6,7,8,9,10,11,12 My_Program"
>> It ran fine, when run "gstat -a -1", I would see 6 nodes running at about
>> 100% CPU time, since each had two copies running.
>>
>> Now, I am using OSCAR 4(LAM-7.0.6) on RH-EL-AS-3 with all GNU compilers(C
>> and FORTRAN), I recompiled my program BTW. Now with the same command, it
>> runs, then just sits there, doing nothing. And from "gstat -a -1", it only
>> shows 6 nodes running at about 50% CPU time, which seems like only one copy
>> running on each node. The "mpitask" shows everything running.
>>
>> Anyone's got any idea?
>>
>> Regards
>> Chen
>>
>> ===========================================
>> Yu Chen
>> Howard Hughes Medical Institute
>> Chemistry Building, Rm 182
>> University of Maryland at Baltimore County
>> 1000 Hilltop Circle
>> Baltimore, MD 21250
>>
>> phone: (410)455-6347 (primary)
>> (410)455-2718 (secondary)
>> fax: (410)455-1174
>> email: chen_at_[hidden]
>> ===========================================
>> _______________________________________________
>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
>
>

===========================================
Yu Chen
Howard Hughes Medical Institute
Chemistry Building, Rm 182
University of Maryland at Baltimore County
1000 Hilltop Circle
Baltimore, MD 21250

phone: (410)455-6347 (primary)
         (410)455-2718 (secondary)
fax: (410)455-1174
email: chen_at_[hidden]
===========================================