LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Brian Barrett (brbarret_at_[hidden])
Date: 2006-04-26 09:16:46


On Apr 25, 2006, at 12:52 PM, kmkb4i902_at_[hidden] wrote:

> I am trying to use the canonical MPI_Comm_spawn() example from MPI
> 2 report (see for example http://www-unix.mcs.anl.gov/mpi/mpi-
> standard/mpi-report-2.0/node98.htm )
>
> The environment is: RedHat Linux ES 3, LAM 7.1.2
>
> The problem is that the slaves are spawned fine but they seem to
> hang in MPI_Init() while the master is still in MPI_Comm_spawn().
> I've done some googling but it didn't yield any usable results. Is
> there something I'm missing?
>
> Both programs are attached below. The only difference over the code
> in the URL above is changed diagnostics.

I'm unable to replicate your problem on our systems. How many
processes are you trying to start? The only thing I can think of is
that you've hit an internal limit on the number of processes a lamd
can start (approximately 60 at a time). If that's not the case, can
you compile LAM with debugging symbols and get a backtrace for where
the applications are hung?

One other possible sticking point that has come up in the past. If
LAM can't allocate enough shared memory for the shared memory
communication device, it will sometimes not be detected and lead to
bad communication topologies with SPAWN. You might want to try
forcing TCP communication by setting the environment variable
LAM_MPI_SSI_rpi to "tcp" to force the use of TCP communication.

> P.S. In case this is read by the person responsible for the mailing
> list management: mailing list search is broken and doesn't yield
> any results. Searching using Google over the site does work fine
> but all documents show up with the same title: "LAM/MPI General
> User's Mailing List Archives." This is very easy to fix: the code
> that generates a PHP page for each message should just set the
> title to the actual subject line of the message instead of the
> generic title.

Yes, this is a known issue. Our university wide search engine staff
has apparently decided that they no longer want to index mailing
archives. It's on our to-do list to fix, but maintaining two MPI
implementations sucks up most of our time these days ;).

Brian

-- 
   Brian Barrett
   LAM/MPI developer and all around nice guy
   Have a LAM/MPI day: http://www.lam-mpi.org/