Dear all,
I have a problem,
I am developing a job scheduling system for PC Cluster system.
So, I want to control parallel job (MPI), and I must get PIDs of all MPI
processes.
Unfortunately, I can't get PID of all MPI processes in remote nodes.
For example,
I execute "mpirun -np 3 ./cpi" at server node and then the MPI program will
run in node 1, 2 and 3.
How do I get the PID in node1, 2 and 3 ?
I tried using "ps" command to get PID by match user ID and program name,
but there are still some problem is that, if a user run several jobs with
the same program name,
then I can't identify the program.
I traced source code of OpenPBS, but I can't the solution.!
( OpenPBS can handle this well. )
How do I get the PID in node1, 2 and 3 ?
thank you~
|