LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2004-11-13 22:45:54


It sounds like you've got some kind of heap corruption. It's
impossible to tell without seeing the full real code -- but that'll be
an exercise left to the reader (i.e., you ;-) ). I would strongly
suggest running your application through a memory-checking debugger
such as Valgrind -- see the debugging section of the LAM FAQ for more
details.

On Nov 12, 2004, at 4:56 PM, Yu-Cheng Chou wrote:

>
> I initialized the "rows", but I didn't show that part of code.
> The thing is when I declare a, b, c as double two-dimentional arrays,
> no
> error occurrs. But when I use malloc() to allocate memories for a, b
> and
> c, the program doesn't work. Why?
>
>> You don't seem to initialize the "rows" value anywhere.
>>
>> for(k=0; k<NCB; k++) {
>> for(i=0; i<rows; i++) {
>> c[i][k] = 0.0;
>> for(j=0; j<NCA; j++) {
>> c[i][k] = c[i][k] + a[i][j]*b[j][k];
>> }
>> }
>> }
>>
>> Is "rows" supposed to be NRA?
>>
>> You're most likely are walking off the end of the allocated memory.
>>
>> Damien Hocking
>>
>> Rome wasn’t built in a meeting.
>>
>>
>>
>> Yu-Cheng Chou wrote:
>>
>>> Hi,
>>> my matrix multiplication c code looks like this:
>>> ---------------------------------------------------------------------
>>> ---
> ---
>>> #include <mpi.h>
>>> #define NRA 300
>>> #define NCA 300
>>> #define NCB 300
>>> .
>>> .
>>> .
>>> int main(int argc, char *argv[]) {
>>> double **a, **b, **c;
>>> .
>>> .
>>> .
>>> MPI_Init(&argc, &argv);
>>> MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
>>> MPI_Comm_rank(MPI_COMM_WORLD, &taskid);
>>> .
>>> .
>>> .
>>> a = (double **)malloc(NRA*sizeof(double *));
>>> for(i=0; i<NRA; i++) {
>>> a[i] = (double *)malloc(NCA*sizeof(double));
>>> }
>>>
>>> b = (double **)malloc(NCA*sizeof(double *));
>>> for(i=0; i<NCA; i++) {
>>> b[i] = (double *)malloc(NCB*sizeof(double));
>>> }
>>>
>>> c = (double **)malloc(NRA*sizeof(double *));
>>> for(i=0; i<NRA; i++) {
>>> c[i] = (double *)malloc(NCB*sizeof(double));
>>> }
>>>
>>> if(taskid == MASTER) {
>>> // initialize matrix a and matrix b
>>> .
>>> .
>>> .
>>> // send matrix data to worker processes
>>> .
>>> .
>>> .
>>> }
>>> else {
>>> // receive matrix data from master process
>>> .
>>> .
>>> .
>>> }
>>>
>>> // for both master and worker processes -- matrix calculation
>>> for(k=0; k<NCB; k++) {
>>> for(i=0; i<rows; i++) {
>>> c[i][k] = 0.0;
>>> for(j=0; j<NCA; j++) {
>>> c[i][k] = c[i][k] + a[i][j]*b[j][k];
>>> }
>>> }
>>> }
>>>
>>> if(taskid == MASTER) {
>>> // receive results from worker processes
>>> .
>>> .
>>> .
>>> }
>>> else {
>>> // send results to master process
>>> .
>>> .
>>> .
>>> }
>>>
>>> // free all dynamically allocated memories
>>> for(i=0; i<NRA; i++) {
>>> free(a[i]);
>>> }
>>> free(a);
>>>
>>> for(i=0; i<NCA; i++) {
>>> free(b[i]);
>>> }
>>> free(b);
>>>
>>> for(i=0; i<NRA; i++) {
>>> free(c[i]);
>>> }
>>> free(c);
>>>
>>> MPI_Finalize();
>>>
>>> return 0;
>>> }
>>> ---------------------------------------------------------------------
>>> ---
> ---
>>>
>>> When I run this program on two machines, error message like this came
> out.
>>>
>>> ---------------------------------------------------------------------
>>> ---
> ---
>>> Rank (0, MPI_COMM_WORLD): Call stack within LAM:
>>> MPI_Recv: process in local group is dead (rank 0, MPI_COMM_WORLD)
>>> Rank (0, MPI_COMM_WORLD): - MPI_Recv()
>>> Rank (0, MPI_COMM_WORLD): - main()
>>>
>>> One of the processes started by mpirun has exited with a nonzero exit
>>> code. This typically indicates that the process finished in error.
>>> If your process did not finish in error, be sure to include a "return
>>> 0" or "exit(0)" in your C code before exiting the application.
>>>
>>> PID 4672 failed on node n1 (169.237.108.13) due to signal 9.
>>> ---------------------------------------------------------------------
>>> ---
> ---
>>>
>>> Any hint for that?
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>>
>>>
>> _______________________________________________
>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
>
> Best Regards,
>
> Yu-Cheng Chou
> Integration Engineering Lab
> Mechanical and Aeronautical Engineering
> University of California, Davis
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/