On Wed, 7 May 2003, krishna kumar rangan wrote:
> I have a server process that creates a single child using ->
> [snipped]
> and it works successfully.
>
> However, when the server tries to create two or more processes using -
>
> MPI_Comm_spawn(progName, argv, 2, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &child,
> err);
>
> LAM prints out the following and aborts -
>
> MPI 2 C++ exception throwing is disabled, MPI::errno has the error code
> spawn : Failed to complete a MPI operation.
I'm assuming that you're actually using the C++ bindings, right?
> I checked the universe_size attribute on the COMM_WORLD. It returns
> the default 4. The last argument to Spawn, 'err', is an array of 10
> integers. I tried the MPI_ERRCODES_IGNORE in vain.
So something else is going wrong, and you're trying to find out what,
right?
One options might be to try MPI_ERRORS_RETURN (MPI::ERRORS_RETURN) and
see what the returned error codes are.
Some other quick things to check:
- is the app that you're trying to launch available on all nodes
- all all required shared libraries available on all nodes, and/or the
proper LD_LIBRARY_PATH settings
> LAM configure line:
> ===================
>
> $ ./configure --prefix=/proj/lsv2/lam-6.5.9 --without-fc --with-threads
> --with-exceptions
What compiler are you using? If gcc, what version?
> The server process is intialized with thread support as:
> ========================================================
> start()
> {
> ...
> int granted;
> MPI_Init_thread(0, NULL, MPI_THREAD_MULTIPLE, &granted);
> // lam presently does not support mpi_thread_multiple. it only returns
> // mpi_thread_serialized - we wrap every mpi call within a mutex.
> assert(granted >= MPI_THREAD_SERIALIZED);
> MPI::ERRORS_THROW_EXCEPTIONS.init();
> MPI::COMM_WORLD.Set_errhandler(MPI::ERRORS_THROW_EXCEPTIONS);
> }
This is a different issue than the spawn problem, and if I'm reading
correctly, only because you're trying to find out what is going wrong
with spawn, right?
Also note that 7.0 supports <= MPI_THREAD_SERIALIZED -- it does *not*
support MPI_THREAD_MULTIPLE (indeed, attempts to use it will result in
LAM being unable to find SSI modules that will operate at
MPI_THREAD_MULTIPLE, and it will abort). More details will be
available in the forthcoming LAM 7.0 User's Guide, but the general
idea is that LAM will not run at a *lower* requested level, so if you
request something that it can't support, it will abort.
> I couldn't get the exception handling to print any information on
> what the problem could be. Can someone throw some light?
I notice that we don't seem to have good configure tests in 6.5.9 for
the necessary compiler flags, such that with gcc 3.x, even if you use
--with-exceptions, it might not get enabled. :-(
You might want to try your code with the latest 7.0 beta for the following
reasons:
- It has MPI::Init_thread()
- Therefore, you don't have to do MPI::ERRORS_THROW_EXCEPTIONS.init()
- The configure tests are better and --with-exceptions does the Right
Things
See http://www.lam-mpi.org/beta/ for more details.
Does this help?
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|