Okay, upon realizing that my last post was terribly
ambiguous..
First of all, let me qualify system command, I meant
running a binary program using system(3).
I'm attempting to take a specific binary program and
make it run in parallel, based on the command line
paramaters it takes. I would like each node to run
the binary with it's own parameters. So my mpi - c++
code sets up the parameters, and runs the binary
program with different parameters based on the rank of
the node and the size of the cluster. The binary is
available to every node.
I considered using lamexec to do this, but setting up
the parameters by node seemed alot easier writing some
mpi code, instead of a complicated script.
I was using the system(3) method to run the binary on
each successive node, assuming that the binary would
be executed where the code was running.
So node 0 (the node calling mpirun) ran the binary
just fine, but the other nodes sent the
sh line1: "binary" command not found
I'm sure when logged into the other nodes by ssh, they
can access the binary. I even provided the direct
path in the argument to system(3), and I still get the
errors.
So i'm naively assuming that the system(3) command is
rerouted somewhere else by mpirun, similar to how cout
prints to the terminal on the screen running mpirun.
And would like to know how to make it run on the nodes
that the code is called in.
I'm using rsh/ssh as my boot module and Lam 7.0. All
my other mpi code (without the system()) runs fine, so
I'm pretty sure the problem's not with the lam
installation.
Any help would be marvelous..
______________________________________________________________________
Post your free ad now! http://personals.yahoo.ca
|