LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2003-08-12 20:57:07


On Tue, 12 Aug 2003, todd keeler wrote:

> I would like to run system commands on each node in my cluster, but when
> I run my program, only the node that I'm actually at ends up running the
> command. Do I need to tell lam to direct certain commands to the nodes
> that the process is running on, instead of the one I called mpirun from?

It sounds like you are mixing up the concept of running in serial and
running in parallel. When you run something at a shell prompt, it is run
on a single node (i.e., the node that you run it on). This is normal unix
semantics.

LAM is an MPI run-time environment. It allows you to run MPI programs in
parallel. For example, you can write an MPI program and run it in
parallel with the mpirun(1) command. Users can also run serial unix
commands in a distributed fashion with the lamexec(1) command (see its man
page for more information). Note that LAM explicitly disallows root from
running any processes in the LAM run-time environment (you'll get an error
message if you try).

> I'm using the c++ system(*char) method to pass the commamds to the
> shell, the errors I get are..
>
> sh: line 1: "command": command not found.

This means that it tried to find an executable named "command" and
couldn't find it. I'm guessing/assuming you are using the system(3)
library call. See the man page for system(3) for more details (this is
somewhat outside the scope of this mailing list -- this list is intended
for questions about LAM/MPI, not general C/C++ problems).

> The main node runs the commands just fine..

This is not really very specific; I don't know what you're trying to say
here. My first guess is that you have an executable that exists on the
local disk on one node in your cluster, but does not exist on the other
nodes in the cluster. If you want to run an executable on any node, it
must be available to run on that node -- either on the local disk or
through a networked filesystem such as NFS. For example, if you compile a
"hello world" program on node A in the cluster, you can't run it on node B
unless B can find the executable in a filesystem somewhere.

(there are corner cases to what I said above, but this seems like a
fundamental misunderstanding of clusters and unix semantics, so the
general explanations seem like the best to start with :-)

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/