Sorry to take so long to reply. :-(
Check the LAM man pages for MPI_Comm_spawn(3) and
MPI_Comm_spawn_multiple(3) -- you can use the MPI_Info argument in a
variety of ways that should help here -- including to specify an app
schema to specify the location of files on different nodes, for
example.
On Sep 22, 2005, at 5:56 PM, Douglas Vechinski wrote:
>
> I created and MPI application that uses two different executables, 1
> master and N slaves. I start off the process by running "mpirun -np 1
> master". The master reads in a config file which specifies the number
> of slaves (N) and the name of the slave executable. The master does
> some other initialization and then uses MPI_Comm_spawn to spawn off N
> slaves using the name of the slave executable provided from the input
> file. The slaves then prompt the master requesting work which the
> master then hands out based upon the problem I am doing. Breaking it
> up
> this way, the master is a really small code and doesn't have all the
> processing code that the slaves do.
>
> Now this all works fine and dandy when I run it in a parallel
> environment that is basically one machine with many processors all the
> same type and a common directory structure.
>
> However, I am now presented with a different environment, several Linux
> PC's on a common network, not necessarily with the same version/flavor
> of Linux on each machine. Right now I am assuming that one machine
> will
> have an exported filesystem that all other machines will be able to
> mount and all input/output files will occur on this filesystem.
>
> I am now trying to figure out it is possible to run in this app using
> the LAM environment. The main problem I am having is in the name of
> the
> slave that is provided to the MPI_Comm_spawn. The slaves may now be
> different (same source just compiled separately on the different
> machines). These executables may be in different locations on the
> different machines.
>
> I first tried settings an environment variable (SLAVE) on each machine
> specifying the directory of the slave on each machine. Then when I
> spawn I used "$SLAVE/slave" as the first argument thinking that $SLAVE
> would maybe get expanded on the remote hosts. This didn't seem to
> work.
>
> Next I tried, modifying my path on each node so that the slave
> executable is visible in my path and then just supplied the slave name
> "slave" to the spawn function. This ran but all the slave processes
> started and were running on the machine where I started the mpirun
> command. None ran on any of the remote machines.
>
> I'm looking for suggestions on how I might could get this setup to work
> with LAM. Or do I need to consider combining the master and slaves
> into
> a single code and let the rank 0 guy take a different branch and
> control
> the slaves that way. This would mean that the master has all the
> processing code that the slaves have but would go unused and request a
> whole bunch of memory that would go unused.
>
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/
|