LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: sid de (siddhartha.de87_at_[hidden])
Date: 2008-09-12 11:39:52


hi
  i am new to lam and have till now run just a few basic programs ...of master slave type where process 0 is assigned as master and it distributes the work to the other processes .I am basically programming lam on a 2-node beowulf cluster ...now the problem is when i invoke my program using mpirun like this :-
 
   mpirun -np 10 myprog
and any of the processes dies lam exits with a message :-
  one of the many processes started by mpirun has failed...
  process ... on node ...  has terminated ...
now i understand that this is how lam is supposed to behave ...if any of the processes in a communicator dies ...(MPI_COMM_WORLD in this case ) then lam kills all the processes in the communicator ...right ? or am i missing something ?

moreover what if i want a program which spawns multiple slave processes and should any of the slave processes fails the master immediately comes to know and redstributes the job ...
any simple code examples !!!