LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Vishal Sahay (vsahay_at_[hidden])
Date: 2004-03-21 13:03:10


Hi James --

It seems that you are facing problems because of incorrect use of the
arguments to MPI_Comm_spawn. The last argument is supposed to be an array
of integers instead of just an integer. I am attaching your program with
embedded comments (marked VS:) with corrections.

Also, it seems that in LAM, MPI_ERRCODES_IGNORE is incorrectly set to
(void *) which is why you may be getting errors when you are using it.
Thanks for pointing that out. For now you can use (int *) 0 instead, when
you dont want error codes and you dont need to MPI_Comm_set_errhandler
then in the way I have set in your program.

For the "lam_spawn_sched_round_robin" not working and the "rank" being 0
for all after spawning, it was because of your memory being overwritten
due to incorrect usage of the last argument in the call to MPI_Comm_spawn.

----------------------------------------------------------------------
[manager.cc]

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#include <string>
#include <iostream>
#include <sys/types.h>
#include <unistd.h>
#include <fstream>
#include <sys/times.h>
#include <time.h>
#include <sys/resource.h>

using namespace std;

int main(int argc, char *argv[])
{
    int universe_size, universe_sizep, flag;
    /*********************************************************/
    /* VS: corrected from *universe_sizep to universe_sizep */
    /*********************************************************/

    MPI_Comm everyone;
    char worker_program[100] = "worker";

    /*********************************************************/
    /* VS: Supposed to write in binary/executable name here
     * changed w.o to worker (which is not an object file but a binary)
     */
    /*********************************************************/
    MPI_Info info;

    char *lam_spawn_sched_round_robin = (char *)
        "lam_spawn_sched_round_robin";

    char name [MPI_MAX_PROCESSOR_NAME];
    int namelen;

    int pid = getpid();

    MPI::Init(argc,argv);

    int rank = MPI::COMM_WORLD.Get_rank();
    int world_size = MPI::COMM_WORLD.Get_size();

    MPI_Get_processor_name(name, &namelen);
    cout<<"Parent "<<rank<<" is "<< name <<" and its process id is "<<
        pid<<endl;

    MPI_Info_create (&info);
    MPI_Info_set (info, "lam_spawn_sched_round_robin", "n2");

    // if(world_size != 1)
    // cout<<"Top heavy with management"<<endl;

    MPI_Comm_get_attr(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE,
                      &universe_sizep, &flag);

    if(!flag)
        {
            printf("This MPI does not support Universe_size.");
            printf("How many processes total?");
            scanf("%d", &universe_size);
        }

    else
        universe_size = universe_sizep;
    /*****************************************************/
    /* VS: replaced *universe_sizep with universe_sizep */
    /*****************************************************/

    if (universe_size == 1)
        cout << "no room to start workers"<<endl;

    /* VS: I dont see any significance of this -- What are you using
       this for? */

    //spawn workers
    int nnodes = 5;
    int *errcodes;

    /**********************************************************************/
    /* VS: MPI_ERRCODES_IGNORE seems to be a bug in LAM now. Thanks
       for pointing this out. For now you can use (int *) 0
       instead */

    /* VS: Also note, you are using the last argument of
       MPI_Comm_spawn in an incorrect fashion. It should be an array
       of integers, not just an int. For each process spawned, this
       function call will return the error code in its slot in the
       array. So the array should have size atleast equal to the number of
       processes you spawn */

    /* VS: I am also enabling err handling below if you want to catch
       error codes */
    /**********************************************************************/

    errcodes = (int *) malloc(sizeof(int) * nnodes);
    memset(errcodes, -1, nnodes);
    MPI_Comm_set_errhandler(MPI_COMM_WORLD, MPI_ERRORS_RETURN);

    MPI_Comm_spawn("worker", MPI_ARGV_NULL, nnodes, info, 0,
                       MPI_COMM_WORLD, &everyone, errcodes);

    pid = getpid();

    cout<<"Parent "<<rank<<" is "<< name <<" and its process id is "<<
        pid <<" after calling spawn"<<endl;

    /*
      MPI_Send(&pid, 1, MPI_INT, 0, 1, everyone);
      MPI_Send(&pid, 1, MPI_INT, 1, 1, everyone);
      MPI_Send(&pid, 1, MPI_INT, 2, 1, everyone);
      MPI_Send(&pid, 1, MPI_INT, 3, 1, everyone);
      MPI_Send(&pid, 1, MPI_INT, 4, 1, everyone);
    */

    MPI::Finalize();

    return 0;
}

-----------------------------------------------------------------------------------------------

-Vishal

On Sat, 20 Mar 2004, James Fang wrote:

# Hi
#
# I have given up on using lam_sche_round_robin, however, I have found that
# if I specify my root to be 3, and only 3, I will be able to create the
# round_robin effect, which spawns a worker on every cpu. If the root is not
# specified as 3, the manager program will only create worker on its own cpu,
# but not in other nodes. Could this be an hardware issue? The only thing
# special about my node 3 that I can think of is that it is node I use to log
# on to the cluster.
#
[snipped]