LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Pak, Anne O (anne.o.pak_at_[hidden])
Date: 2003-06-23 12:53:42


hello:

i have a configuration where a single master spawns about 15 slave nodes and
runs the same piece of code on all 15 slaves . in this piece of code, i
have something like: (pseudo-code)

call subroutine_A(arguments)
do some stuff
call subroutine_B(arguments)

about half of the slave nodes execute A and B
about 2-3 of them only execute A only
the rest of them execute A twice and not B

is there any reason why this would be, being that all slaves are supposed to
be running the same C code??

i am using MPI_Comm_spawn to spawn these slave processes, and I am using a
schema file (specifed by MPI_Info variable as one of the arguments for
MPI_Comm_spawn).

the slave routine is was an excerpt from another program and it worked in
the other program and no changes were made to the code when i copied it, so
it should still work:(

could it be a memory limitation at some of the slave nodes? the calling
routine for this C program is a MEX function and this MEX function is being
sent about 50 arguments. Perhaps there is a limitations on the number of
arguments that can be sent to a C function??

lastly, i can't seem to run a program over a cluster two consecuative times.
i seem to always need to execute a 'wipe' and 'lamboot' between calls. does
it have to do with me calling mpi_disconnect at the conclusion of the first
run and then MPI_Init at the start of the second one?

thanks,

anne

___________________________________________________
Anne Pak, L1-50
Building 153 2G8
1111 Lockheed Martin Way
Sunnyvale, CA 94089
(408) 742-4369 (W)
(408) 742-4697 (F)