LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Shaun G. (temp_usy_at_[hidden])
Date: 2009-07-29 04:12:56


Dear All,

I have an application (a legacy executable) which was written to run with MPI. I need to repeatedly run this executable application (in each run in gets a different input file). The cluster uses P6 machines. No MPI processes are started on the master.

Each application run is restricted to a prescribed wall-clock time limit. If a run hasn't finished within the limit it should be terminated so the next run can start `clean'. Also, the executable application may hang or crash for some inputs.

How can I check:
a) if the application has crashed
b) if the application hasn't completed running within the time limit
c) if a) or b) are true, kill the MPI processes on all the nodes  started by this application (the current run).

The manual mentions `lamclean', but does it kill only my  processes or other users' processes as well?.

Also, is it enough to check the runtime of the `mpirun' script? if I kill `mpirun' after the time limit has been reached will this kill the MPI processes on the remote nodes?.

Lastly, the `search' function on the mailing archive reports `
Sorry, you didn't specify any search criteria, so no search was performed. ' whatever text I typed in the search box.

Cheers and thanks,
Shaun.