LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2006-08-03 07:47:02


On 8/2/06 12:30 PM, "Ionel Mugurel Ciobica" <tgakic_at_[hidden]> wrote:

> Thank you, Jeff. I merged the scripts to start all jobs at the same
> time with your sugestion. It is working.

Excellent!
 
> It is possible to do it still from four different scripts? I mean, even
> running some of those jobs at a later time?

Yes. Once you lamboot, you can mpirun as many times as you wish and if
you're not running in a batch queue environment, then there is nothing
special about the shell that you ran lamboot from -- you can mpirun from any
shell on any of the hosts in your LAM universe (including one entirely
unrelated to where you ran lamboot from, such as in a different window or
somesuch).

If you are running in a scheduled environment (e.g., SLURM, Torque, etc.),
then there are special environment variable markers denoting your specific
scheduled job (e.g., the SLURM or Torque job that you are running in) in the
shell that you ran lamboot from such that if you mpirun from an unrelated
shell (i.e., one that did not inherit the environment variable markers),
mpirun will not be able to "find" the LAM universe that you booted. There
are ways around that, but I wouldn't recommend them.

You'll also need to be careful of race conditions with lamboot / lamhalt.
For example, if you do something like:

lamboot
run_script1 &
run_script2 &
run_script3 &
run_script4 &
lamhalt

You'll probably kill all your jobs because they're still running when you
ran lamhalt.

The above is obviously a trivial/contrived example -- much more complex
scenarios can occur, too. My point is that you need to be careful to not
launch any scripts until lamboot has completed, and to ensure to not run
lamhalt until all your scripts/applications have completed.

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems