On Oct 24, 2005, at 12:46 PM, Austin Leach wrote:
> [snipped]
> in this area. That being said, I am using the following command to run
> my application.
>
> mpirun C -s n0 ../../myprogram < myinfile (the executable is not in
> the
> same dir as the infiles)
This all sounds reasonable.
> LAM is booting on the nodes successfully, so that isnt a problem.
> However, I can only get the command above to work if I remove
> 'schedule=no' from the machine file. The -s option is the only way I
> have been able to get a job to run, and Im relatively sure that if -s
> is
> used, then the node MUST be schedulable.
-s is unrelated to scheduling.
> My guess is that their is a path problem? I have read through the
> FAQs and haven't been able to figure out the problem. The directory
> from which I execute mpirun is present on all nodes, and the
> permissions of and inside of said directory are rwxrwxrwx.
Sidenote: remember that the unix permissions bits are [mostly]
irrelevant in an AFS filesystem.
> With the headnode set to schedule=no,
>
> When I do: "mpirun C ../../myprogram < myinfile"
> I get: mpirun: cannot start ../../myprogram on n1: No such file or
> directory
>
> (Remember the working directory is present on all nodes in the same
> location from which mpirun was executed from)
Hum. For the working directory, that's quite odd -- the working
directory should be set by LAM to be whatever it is where mpirun is on
all nodes (unless changing to that directory is an error, which case it
starts from your $HOME on the remote nodes).
Note, however, that LAM only gives stdin to the lowest MPI_COMM_WORLD
rank on the same node as mpirun -- so I think you're not going to be
able to distribute your input in this way. Perhaps changing your
command line and code to be something like this might work:
mpirun ... ../../myprogram -input myinfile
(and then only have MCW rank 0 pay attention to the -input argument)
FWIW, Open MPI passes the stdin of mpirun to all MPI processes.
> When I do: "mpirun -wd path/to/dir C ../../myprogram < myinfile"
> The application starts and ends without actually doing anything. (Im
> not
> sure how to describe this .. it prints the program header, then gives
> the total time for the run, and exits).
This is probably a symptom of the problem above (LAM not giving stdin
to the processes you are expecting to get it).
> However, when I use the -s n0 option with the schedule=no removed, from
> the same directory, the application runs seemingly OK.. just with the
> CPU usage not distributed across the cluster as I would like.
I assume you mean that it's using the node that you don't want it to
use, right?
As another experiment, try without schedule=no and without -s and just
do:
mpirun n1,n2,n3,n4 ../../myapplication --input myinfile
(which is effectively what schedule=no/C should do)
--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/
|