Hi Ravi,
I developed a very similar application like yours. In my application
the number of subdomains is constant in number and position, so I have
a cartesian topology in such a way every subdomain needs exchange
information between the same subdomains in each iteration. Taking this
into account, I have used the function MPI_Cart_create(...), to
build a virtual topology, then the functions MPI_Send_init(...) start the
sockets needed between neighbors subdomains. All this is made out of the
main loop avoiding the overloading of sockets creation in every iteration.
Inside the main loop, just use the functions MPI_Start(...) to
initiate the information exchange.
On the other way, the speedup depends on the type of problem, I have
solving linear and non-linear PDE's, in the first case I have very good
results, but not so good for the second case.
I hope this help.
Regards,
.--.
Luis M. de la Cruz S. |o_o | PhD. Candidate, IIMAS, UNAM
Cómputo Aplicado, |:_/ | Linux User: 195159
DGSCA, UNAM, México // \ \
Tel. 562-26774 (| | ) ¡L'art c'est l'azur!
http://www.mcc.unam.mx/~lmd /'_ _/`\ Victor Hugo
___)=(___/
On Wed, 20 Apr 2005, Kumar, Ravi Ranjan wrote:
> Hi,
>
> I have written a parallel code in C++/MPI. The parallel code uses strip
> partitioning method to didvide the bigger block (a 3D cuboid) into several
> smaller blocks/slices. Each smaller block is assigned to a processor for
> updating data within it. Interface data has to be exchanged at each iteration.
> Within a slice, all the planes (say rows) of data are marked red & black
> alternately. Red rows are always at the both boundaries of the block/slice
> hence only interfacial red rows require to exchange data but black rows do not,
> as they need data from neighboruing red rows. Here is the pseudo code:
>
> for(time=1'time<=Nt;time++)
>
> {
>
> do
> {
>
> /* overlapped_comm_comp_subroutine */
>
> /* MPI_Allredue(..to find global max error..) */
>
> } while(convergence criteria is met)
>
> }
>
>
> I wish to reduce the wall-clock runtime as much as possible by overlapping comm
> with computaion. For this I used non-blocking MPI_Isend/Irecv with MPI_wait().
> Still my code is not showing good scaling? What can be the reason for the poor
> scalability? Is it due to the reason that my code is not well optimized? Is
> there any other way I can optimize it more? Pls see the pseudo code for
> overlapped_comm_comp_subrouine and suggest me if anything is wrong with the
> code.
>
> overlapped_comm_comp_subroutine:
>
> if(rank is even)
> {
> /* update rightmost boundary */
> MPI_Isend(..rightmost boundary to next neighbour..)
> /* update last half red rows */
> MPI_wait()
> }
>
> else if(rank is odd)
> {
> MPI_Irecv(..from prev neighbour..)
> /* update first half black rows */
> MPI_Wait()
> }
>
> if(rank is odd)
> {
> /* update leftmost red boundary */
> MPI_Isend(..leftmost red boundary to prev neighbr..)
> /* update first half red rows */
> MPI_Wait()
> }
>
> else if(rank is even)
> {
> MPI_Irecv(..updated red bndry from next neghbr..)
> /* update last half black rows */
> MPI_wait()
> }
>
>
> // SAME THING IS REPEATED FOR OTHER HALF OF THE ROWS //
>
>
> if(rank is even)
> {
> /* update leftmost boundary */
> MPI_Isend(..leftmost boundary to prev neighbour..)
> /* update first half red rows */
> MPI_wait()
> }
>
> else if(rank is odd)
> {
> MPI_Irecv(..from next neighbour..)
> /* update last half black rows */
> MPI_Wait()
> }
>
> if(rank is odd)
> {
> /* update rightmost red boundary */
> MPI_Isend(..rightmost red boundary to next neighbr..)
> /* update last half red rows */
> MPI_Wait()
> }
> else if(rank is even)
> {
> MPI_Irecv(..updated red bndry from prev neghbr..)
> /* update first half black rows */
> MPI_wait()
> }
>
>
> //END of Subroutine //
>
> Thanks a lot!
>
> Ravi R. Kumar
>
>
>
>
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
|