And you just might be suffering from heap fragmentation, if you are allocating a large enough variety of sizes frequently enough.
Quick fix: allocate fixed-sized chunks of some maximum size.
Vince Virgilio
> -----Original Message-----
> From: lam-bounces_at_[hidden]
> [mailto:lam-bounces_at_[hidden]] On Behalf Of Amey Dharurkar
> Sent: Thursday, October 09, 2003 1:26 PM
> To: General LAM/MPI mailing list
> Subject: Re: LAM: MPI_Recv: unclassified: Cannot allocate memory
>
>
>
> Hello,
>
> > Hi,
> >
> > under which circumstances does LAM throw an error message like this:
> > ---snip---
> > Frequency Step Number 22
> > Frequency Step Number 23
> > Frequency Step Number 24
> > MPI_Recv: unclassified: Cannot allocate memory (rank 3, comm 4)
> > Rank (7, MPI_COMM_WORLD): Call stack within LAM:
> > Rank (7, MPI_COMM_WORLD): - MPI_Recv()
> > Rank (7, MPI_COMM_WORLD): - main()
> > MPI_Recv: unclassified: Cannot allocate memory (rank 3, comm 4)
> > Rank (11, MPI_COMM_WORLD): Call stack within LAM:
> > Rank (11, MPI_COMM_WORLD): - MPI_Recv()
> > Rank (11, MPI_COMM_WORLD): - main()
> > MPI_Recv: unclassified: Cannot allocate memory (rank 2, comm 4)
> > Rank (14, MPI_COMM_WORLD): Call stack within LAM:
> > Rank (14, MPI_COMM_WORLD): - MPI_Recv()
> > Rank (14, MPI_COMM_WORLD): - main()
> > MPI_Recv: unclassified: Cannot allocate memory (rank 1, comm 4)
> > Rank (13, MPI_COMM_WORLD): Call stack within LAM:
> > Rank (13, MPI_COMM_WORLD): - MPI_Recv()
> > Rank (13, MPI_COMM_WORLD): - main()
> >
> --------------------------------------------------------------
> ---------------
> >
> > One of the processes started by mpirun has exited with a
> nonzero exit
> > code. This typically indicates that the process finished in error.
> > If your process did not finish in error, be sure to include
> a "return
> > 0" or "exit(0)" in your C code before exiting the application.
> >
> > PID 5096 failed on node n4 with exit status 1.
> >
> --------------------------------------------------------------
> ---------------
> > ---snip---
> >
> > "Cannot allocate memory"is obvious. But on a node with 2GB
> RAM, no other
> > procs, and a local matrix size of 142MB? LAM version is
> 6.5.9, taken from
> > SuSE Linux 8.2. The failing program is used for parallel
> matrix setup and
> > decomposition using BLACS and ScaLAPACK on a 16-node P4
> cluster system.
> >
> > Any comments appreciated.
>
> The error in being thrown because LAM is unable to complete a malloc()
> inside. Since you are dealing with very large messages and data
> structures, memory is being used rapidly and that is why you
> are getting
> the error.
>
> I suggest that you should try to check the program using
> memory-checking
> debuggers (valgrind/bcheck) for obvious problems. Also note
> that you need
> to recompile LAM --with-purify.
>
> Hope this helps.
>
> ---------------
> Amey Dharurkar
> Graduate Student, Indiana Univeristy.
>
> >
> > Regards,
> > Michael
> > _______________________________________________
> > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> >
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
************************************
If this email is not intended for you, or you are not responsible for the delivery of this message to the addressee, please note that this message may contain ITT Privileged/Proprietary Information. In such a case, you may not copy or deliver this message to anyone. You should destroy this message and kindly notify the sender by reply email. Information contained in this message that does not relate to the business of ITT is neither endorsed by nor attributable to ITT.
************************************
|