Here is a snippet from some code (from a.out) that was able to use the
system command to run an MPI job with LAM 6.6b1.
saveRunline(mpirunLine,0);
//printf("%s\n",mpirunLine);
saveRoutes();
system("./cpHosts");
system("lamboot -l lamhosts");
printf("%s\n",mpirunLine);
system(mpirunLine);
system("lamhalt");
All you have to know is:
1. a.out is in /home/me
2. cpHosts is also in /home/me
3. /home/me is an NFS directory across all nodes with the same absolute path
4. it worked.
From your post:
I considered using lamexec to do this, but setting up
the parameters by node seemed alot easier writing some
mpi code, instead of a complicated script.
It appears that you want LAM to run your script on each of the nodes via
the lamexec command and that each script will be run with different
parameters. Unfortunately I haven't used lamexec, so I don't know how it
handles arguments, perhaps you have to pass the arguments with single
quotes (like rsh 'uptime') so that the arguments are not executed until on
the remote machine. One test would be lamexec $HOSTNAME, or some variation.
I'm sure when logged into the other nodes by ssh, they
can access the binary. I even provided the direct
path in the argument to system(3), and I still get the
errors.
Can you verify this, so that "I'm sure" changes to "I know that," perhaps
by running the lamexec command from the command line instead of within the
C++ code?
-j
|