LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Shi Jin (jinzishuai_at_[hidden])
Date: 2005-04-03 11:03:17


Hi there,

I recently had a very wired problem. I inherited
somebody 's MPI code but I only want to ran it with
single process since the problem size is too small to
have any speedup with parallization. But I still
compile the code using mpif90 and run it with "lamboot
localhost" first. I ran it directly by ./Codename
since it is equavalent to "mpirun -np 1 ./Codename".

But my code blew up at some point and my major
suspection is in the code, I have two lines at the
end of one function as:
 call
MPI_ALLREDUCE(energy,t1,1,dtype2,MPI_SUM,comm,ierr)
 call
MPI_ALLREDUCE(localbanden,t2,1,dtype2,MPI_SUM,comm,ierr)
I suspect that the function returns before
MPI_ALLREDUCE actually set the correct number to t1
and t2. So I did a simple remedy by adding a
MPI_BARRIER after each MPI_ALLREDUCE and the code runs
fine forever.

I know all the MPI collective calls are not ganranteed
 to synchronize by the MPI standard but I have no idea
why the situation should have happended to me.
Please comment and advise.
Thanks a lot.

Shi

                
__________________________________
Do you Yahoo!?
Yahoo! Personals - Better first dates. More second dates.
http://personals.yahoo.com