LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Vishal Sahay (vsahay_at_[hidden])
Date: 2004-03-18 18:41:07


Hi --

# I am using a cluster that consists of 4 nodes, i am trying to do a
# manager/worker program in which the workers monitor the system usage of the
# manager. Mainly, one node would be the manager, the 3 nodes would each
# produce a worker. But when I tried to spawn workers, it seems that they
# get spawned sporadically, for example, if I spawn 4 workers, node 0 might
# spawn 2 and node 1 might spawn 1, and then the program stops.
# I consulted the following link

Just would like to have more information on this and in a bit elaborate
way --

- How many nodes you lamboot on (the output of lamnodes, if you are using
lam 7.xx - you can hide the names of the hostnames if you wish)

- Give me some clear examples on the sporadic spawning you see.

- what version of LAM are you using.

# I've included my some of my codes below:
#
# MPI_Info info;
# MPI_Info_create (&info);
# MPI_Info_set (info, "lam_spawn_sched_round_robin", "n3");
# MPI_Comm_spawn(worker_program, MPI_ARGV_NULL, 3, info, 0, MPI_COMM_SELF,
# &everyone, &err);
#
# by the way, another question is, why does root always have to be zero, it
# seems that I always get the error : MPI_Comm_spawn: invalid root (rank 0,
# MPI_COMM_SELF).

One thing to note is that you have used MPI_COMM_SELF and not
MPI_COMM_WORLD in your callto MPI_Comm_spawn. MPI_COMM_SELF just includes
you (size 1, rank 0). And I think you are starting just one copy of
manager. In that case you would have MPI comm size as 1, which is just
your manager and so a rank 0 needs to be provided as argument to root.
Root basically specifies the node where you want the arguments to
MPI_Comm_spawn (prior to root) to be checked, all calls to MPI_Comm_spawn
in other copies of the manager (if you start multiple copies) will ignore
the arguments then. It does not have to be 0 if you have multiple copies
of the manager.

As I see, the above error of "invalid root" should only happen if (root
>=size) or (root < 0). So I am unable to replicate the error here. Can you
send across your complete application, which I think should throw more
light.