LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: James Fang (cf8e_at_[hidden])
Date: 2004-03-18 20:00:08


Thank you for point out the questions I had about root. Now according to
what is specified below, shouldn't the first worker be spawn at node3?
However, the program still started spawning from node0 and that node1
actually spawned two worker while node3 did not spawn any thing at
all. hank you for point out the questions I had about root. Now according
to what is specified below, shouldn't the first worker be spawn at node3?
Or for the my purpose, should root just be set to rank? I tried that but,
program just halts and doesn't do anything after printing out

Parent 0 is viscomp01.vision
Parent 2 is visdata2.vision
Parent 1 is visback.vision
Parent 3 is vision.sys.virginia.edu

>- How many nodes you lamboot on (the output of lamnodes, if you are using
>lam 7.xx - you can hide the names of the hostnames if you wish)

I lamboot on all four nodes

>- what version of LAM are you using.

I am using 6.5.6

- Give me some clear examples on the sporadic spawning you see
See below:

Thank you very much for your help

James

//sample
output***************************************************************************************************
[cf8e_at_vision ~]$ mpirun N a.out
Parent 0 is viscomp01.vision
Parent 2 is visdata2.vision
Parent 1 is visback.vision
Parent 3 is vision.sys.virginia.edu
*************Child 3 is visdata2.vision****************
*************Child 1 is visback.vision****************
Parent size is 4
Parent size is 4
8021 is the pid
*************Child 0 is viscomp01.vision****************
3239 is the pid
Parent size is 4
*************Child 2 is visback.vision****************
15667 is the pid
On this system, a jiffy is 1/100 second
the page size is 4096 bytes
On this system, a jiffy is 1/100 second
the page size is 4096 bytes
the user time is 0
the system time is 0
the user time is 0
the approximation of processor time used by the program is 0
the system time is 0
the approximation of processor time used by the program is 0
On this system, a jiffy is 1/100 second
the page size is 4096 bytes
the user time is 0
the user time from getrusage is 0
the system time from getrusage is 0
the number of messages sent is 0
the user time from getrusage is 0
the number of messages recieve is 0
the system time from getrusage is 0
the number of messages sent is 0
Parent size is 4
8022 is the pid
the number of messages recieve is 0
the system time is 0
On this system, a jiffy is 1/100 second
the approximation of processor time used by the program is 0
the user time from getrusage is 0
the page size is 4096 bytes
the user time is 0
the system time is 0
the approximation of processor time used by the program is 0
the system time from getrusage is 0
the user time from getrusage is 0
the system time from getrusage is 0
The total amount of memory of this process and its data (in pages) are 444, 0
the number of messages sent is 0
The total amount of time (in jiffies) this process is schdule in the user
and the kernel mode is 0, 0
the number of messages sent is 0
the number of messages recieve is 0
the number of messages recieve is 0
The total amount of memory of this process and its data (in pages) are 477, 0
The total amount of memory of this process and its data (in pages) are 444, 0
The total amount of time (in jiffies) this process is schdule in the user
and the kernel mode is 0, 0
The total amount of time (in jiffies) this process is schdule in the user
and the kernel mode is 0, 0
The total amount of memory of this process and its data (in pages) are 444, 0
The total amount of time (in jiffies) this process is schdule in the user
and the kernel mode is 0, 0

//manager**************************************************************************************************************************
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#include <string>
#include <iostream>
#include <sys/types.h>
#include <unistd.h>
#include <fstream>
#include <sys/times.h>
#include <time.h>
#include <sys/resource.h>

using namespace std;

int main(int argc, char *argv[])
{
         int universe_size, *universe_sizep, flag;
         MPI_Comm everyone;
         char worker_program[100] = "w.o";

         MPI_Info info;
         char *lam_spawn_sched_round_robin = (char *)
"lam_spawn_sched_round_robin";

         char name [MPI_MAX_PROCESSOR_NAME];
         int namelen;

         MPI::Init(argc,argv);

         int rank = MPI::COMM_WORLD.Get_rank();
     int world_size = MPI::COMM_WORLD.Get_size();
         MPI_Get_processor_name(name, &namelen);
         cout<<"Parent "<<rank<<" is "<<name<<endl;

         MPI_Info_create (&info);
     MPI_Info_set (info, "lam_spawn_sched_round_robin", "n3");

// if(world_size != 1)
// cout<<"Top heavy with management"<<endl;

         MPI_Comm_get_attr(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE,
&universe_sizep, &flag);

         if(!flag)
         {
                 printf("This MPI does not support Universe_size.");
                 printf("How many processes total?");
                 scanf("%d", &universe_size);
         }

         else
                 universe_size = *universe_sizep;

         if (universe_size == 1)
                 cout << "no room to start workers"<<endl;

     //spawn workers
         int err = 2; //meaning less variable, have to use this since i
get a uisng void* error with MPI_ERRCODES_IGNORE

          MPI_Comm_spawn(worker_program, MPI_ARGV_NULL, 4, info, 3,
MPI_COMM_WORLD, &everyone, &err);
/*
    if(rank == 1)
    MPI_Comm_spawn(worker_program, MPI_ARGV_NULL, 1, info, 1,
MPI_COMM_WORLD, &everyone, &err);

    if(rank == 0)
    MPI_Comm_spawn(worker_program, MPI_ARGV_NULL, 1, info, 0,
MPI_COMM_WORLD, &everyone, &err);

    if(rank == 2)
    MPI_Comm_spawn(worker_program, MPI_ARGV_NULL, 1, info, 2,
MPI_COMM_WORLD, &everyone, &err);

    if(rank == 3)
    MPI_Comm_spawn(worker_program, MPI_ARGV_NULL, 1, info, 3,
MPI_COMM_WORLD, &everyone, &err);
*/

    MPI::Finalize();

         return 0;
}

//worker*************************************************************************************************************************************************
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#include <string>
#include <iostream>
#include <sys/types.h>
#include <unistd.h>
#include <fstream>
#include <sys/times.h>
#include <time.h>
#include <sys/resource.h>
#include <mpi++.h>

using namespace std;

void InfoOutput(string infoContent, int space_count1, int space_count2, int
cpu_or_mem)
{

    string info1, info2;
    string infoShown;
    int space_count = 0; char check_for_space = ' ';
    if(cpu_or_mem == 0)
            infoShown = "The total amount of time (in jiffies) this process
is schdule in the user and the kernel mode is ";
    else
            infoShown = "The total amount of memory of this process and its
data (in pages) are ";

    for (int i = 0; i < infoContent.length(); i++)
    {

          if(infoContent[i] == ' ')
                 space_count++;

            if(space_count == space_count1)
            {

                 for (int j = i+1; j < infoContent.length(); j++)
                 {

                         if(infoContent[j] != check_for_space)
                         {

                             info1 += infoContent[j];

                        }

                         else
                                break;

                 }

            }

         if(space_count == space_count2)
            {

                 for (int j = i+1; j < infoContent.length(); j++)
                 {

                         if(infoContent[j] != check_for_space)
                         {

                                 info2 += infoContent[j];

                         }

                         else
                                 break;

                 }

            }

    }

         cout<<infoShown<<info1<<", "<<info2<<endl;

}

//***********************************************************************************************************

int main(int argc, char* argv[])
{

// Get Command from parent int size;
     MPI_Comm parent;
         char name [MPI_MAX_PROCESSOR_NAME];
         int namelen;
     int size;

         MPI::Init(argc, argv);
         MPI_Comm_get_parent(&parent);
         if(parent == MPI_COMM_NULL)
         cout<<"No Parent!";

         int rank = MPI::COMM_WORLD.Get_rank();
     int world_size = MPI::COMM_WORLD.Get_size();
         MPI_Get_processor_name(name, &namelen);
         cout<<"*************Child "<<rank<<" is
"<<name<<"****************"<<endl;

         MPI_Comm_remote_size(parent, &size);
         cout<<"Parent size is "<<size<<endl;

         int pid = getpid();
         cout << pid << " is the pid "<<endl;
         string cpid;
         char i;

         tms cpuTime;
         times(&cpuTime);
         double jiffy = sysconf(_SC_CLK_TCK); double page_size =
sysconf(_SC_PAGE_SIZE);

         rusage cpuTimeFromG;
         int who = RUSAGE_SELF;
         getrusage(who , &cpuTimeFromG);

         cout<<"On this system, a jiffy is 1/"<<jiffy<<" second"<<endl;
         cout<<"the page size is "<<page_size<<" bytes"<<endl;

         cout<<"the user time is " << cpuTime.tms_utime <<endl;
         cout<<"the system time is " << cpuTime.tms_stime<<endl;
         cout<<"the approximation of processor time used by the program is
"<< clock()<<endl;
         cout<<"the user time from getrusage is " <<
cpuTimeFromG.ru_utime.tv_sec <<endl;
         cout<<"the system time from getrusage is " <<
cpuTimeFromG.ru_stime.tv_sec<<endl;
         cout<<"the number of messages sent is " << cpuTimeFromG.ru_msgsnd
<<endl;
         cout<<"the number of messages recieve is " <<
cpuTimeFromG.ru_msgrcv<<endl;

     while(pid > 0)
         {
                   i = (char)(pid % 10) + 48;
           pid = pid / 10;
           cpid = i + cpid; }
           string cpuInfo = "//proc//" + cpid +"//stat";
           string memInfo = "//proc//" + cpid +"//statm";

           ifstream OpenCpuInfoFile(cpuInfo.c_str());
           ifstream OpenMemInfoFile(memInfo.c_str());

                   char n1, n2;
               string cpuContent;
           string memContent;

    // open the file
   // fin.open(filename);
  // check for successful file open

           if( !OpenCpuInfoFile || !OpenMemInfoFile)
                   {
              cout << "can't open file\n";
              exit(1);
           }

    // loop, reading two integers at a time

            while( !OpenCpuInfoFile.eof())
                         {
                                 OpenCpuInfoFile.get(n1);
                 cpuContent+=n1;
             }

       while( !OpenMemInfoFile.eof())
             {
                                 OpenMemInfoFile.get(n2);
                 memContent+=n2;

             }

   // cout<<"mem string s length is " << memContent.length() <<endl;
    //cout<<"cpu string s length is " << cpuContent.length() <<endl;

    InfoOutput(memContent, 3, 4, 1); //send memContent and specify the
position of the space to obtain desired info
    InfoOutput(cpuContent, 13, 14, 0); //send cpuContent and specify the
position of the space to obtain desired info

   // fin.close();
    OpenMemInfoFile.close();
    OpenCpuInfoFile.close();

    MPI::Finalize();

    return 0;

}

At 06:41 ¤U¤È 2004/3/18 -0500, you wrote:
>Hi --
>
>
># I am using a cluster that consists of 4 nodes, i am trying to do a
># manager/worker program in which the workers monitor the system usage of the
># manager. Mainly, one node would be the manager, the 3 nodes would each
># produce a worker. But when I tried to spawn workers, it seems that they
># get spawned sporadically, for example, if I spawn 4 workers, node 0 might
># spawn 2 and node 1 might spawn 1, and then the program stops.
># I consulted the following link
>
>Just would like to have more information on this and in a bit elaborate
>way --
>
>- How many nodes you lamboot on (the output of lamnodes, if you are using
>lam 7.xx - you can hide the names of the hostnames if you wish)
>
>- Give me some clear examples on the sporadic spawning you see.
>
>- what version of LAM are you using.
>
>
># I've included my some of my codes below:
>#
># MPI_Info info;
># MPI_Info_create (&info);
># MPI_Info_set (info, "lam_spawn_sched_round_robin", "n3");
># MPI_Comm_spawn(worker_program, MPI_ARGV_NULL, 3, info, 0, MPI_COMM_SELF,
># &everyone, &err);
>#
># by the way, another question is, why does root always have to be zero, it
># seems that I always get the error : MPI_Comm_spawn: invalid root (rank 0,
># MPI_COMM_SELF).
>
>One thing to note is that you have used MPI_COMM_SELF and not
>MPI_COMM_WORLD in your callto MPI_Comm_spawn. MPI_COMM_SELF just includes
>you (size 1, rank 0). And I think you are starting just one copy of
>manager. In that case you would have MPI comm size as 1, which is just
>your manager and so a rank 0 needs to be provided as argument to root.
>Root basically specifies the node where you want the arguments to
>MPI_Comm_spawn (prior to root) to be checked, all calls to MPI_Comm_spawn
>in other copies of the manager (if you start multiple copies) will ignore
>the arguments then. It does not have to be 0 if you have multiple copies
>of the manager.
>
>As I see, the above error of "invalid root" should only happen if (root
> >=size) or (root < 0). So I am unable to replicate the error here. Can you
>send across your complete application, which I think should throw more
>light.
>_______________________________________________
>This list is archived at http://www.lam-mpi.org/MailArchives/lam/