hello:
i have a configuration where a single master spawns about 15 slave nodes and
runs the same piece of code on all 15 slaves . in this piece of code, i
have something like: (pseudo-code)
call subroutine_A(arguments)
do some stuff
call subroutine_B(arguments)
about half of the slave nodes execute A and B
about 2-3 of them only execute A only
the rest of them execute A twice and not B
is there any reason why this would be, being that all slaves are supposed to
be running the same C code??
i am using MPI_Comm_spawn to spawn these slave processes, and I am using a
schema file (specifed by MPI_Info variable as one of the arguments for
MPI_Comm_spawn).
the slave routine is was an excerpt from another program and it worked in
the other program and no changes were made to the code when i copied it, so
it should still work:(
could it be a memory limitation at some of the slave nodes? the calling
routine for this C program is a MEX function and this MEX function is being
sent about 50 arguments. Perhaps there is a limitations on the number of
arguments that can be sent to a C function??
lastly, i can't seem to run a program over a cluster two consecuative times.
i seem to always need to execute a 'wipe' and 'lamboot' between calls. does
it have to do with me calling mpi_disconnect at the conclusion of the first
run and then MPI_Init at the start of the second one?
thanks,
anne
___________________________________________________
Anne Pak, L1-50
Building 153 2G8
1111 Lockheed Martin Way
Sunnyvale, CA 94089
(408) 742-4369 (W)
(408) 742-4697 (F)
|