LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: richard pan (orgilno1_at_[hidden])
Date: 2008-05-07 20:36:39


> > Hi, I'm studying about Parallel processing right
> now. And I'm kinda
> > new to this world.
> > When I test my cluster implementation using inverse
> matrix, 1 mill
> > times 1 mill, the MPI_Barrier always error. Is there a
> way to remove
> > this error? I can assure you that the source code
> I've been using
> > dont have any error, since I used it to test the same
> inverse matrix
> > but only until about 6000 times 6000.
>
> It's kind of hard to help when you don't include
> the error message
> that MPI_BARRIER caused. That being said, generally errors
> in barrier
> are caused by a previous problem, such as memory corruption
> from
> overwriting an array. You might want to use a memory
> debugger such as
> valgrind to make sure you don't have any issues in your
> code. Just
> because something works at one matrix size does not mean
> that its
> correct -- we've seen many times where one matrix size
> works and
> another doesn't, simply because of what was placed
> directly after the
> array, depending on the whims of the compiler / allocator.
>
> Brian
>
> --
> Brian Barrett
> LAM/MPI Developer
> Make today a LAM/MPI day!

I have attached the code that I'm using. It would be a great help for me if you guys could help me in this problel.
My error code is as follows:
 MPI_Recv: process in local group is dead (rank 2, MPI_COMM_WORLD)
Rank (2, MPI_COMM_WORLD): Call stack within LAM:
 MPI_Recv: process in local group is dead (rank 1, MPI_COMM_WORLD)
 Rank (1, MPI_COMM_WORLD): Call stack within LAM:
 Rank (2, MPI_COMM_WORLD): - MPI_Recv()
 Rank (2, MPI_COMM_WORLD): - MPI_Barrier()
 Rank (2, MPI_COMM_WORLD): - main()
 Rank (1, MPI_COMM_WORLD): - MPI_Recv()
 Rank (1, MPI_COMM_WORLD): - MPI_Barrier()
 Rank (1, MPI_COMM_WORLD): - main()
 --------------------------------------------------------------------------
 One of the processes started by mpirun has exited with a nonzero exit
 code. This typically indicates that the process finished in error.
 If your process did not finish in error, be sure to include a "return
 0" or "exit(0)" in your C code before exiting the application.

Thank you,
Sincerly
Richard

      ____________________________________________________________________________________
Be a better friend, newshound, and
know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ