Hello,
> Hi,
>
> under which circumstances does LAM throw an error message like this:
> ---snip---
> Frequency Step Number 22
> Frequency Step Number 23
> Frequency Step Number 24
> MPI_Recv: unclassified: Cannot allocate memory (rank 3, comm 4)
> Rank (7, MPI_COMM_WORLD): Call stack within LAM:
> Rank (7, MPI_COMM_WORLD): - MPI_Recv()
> Rank (7, MPI_COMM_WORLD): - main()
> MPI_Recv: unclassified: Cannot allocate memory (rank 3, comm 4)
> Rank (11, MPI_COMM_WORLD): Call stack within LAM:
> Rank (11, MPI_COMM_WORLD): - MPI_Recv()
> Rank (11, MPI_COMM_WORLD): - main()
> MPI_Recv: unclassified: Cannot allocate memory (rank 2, comm 4)
> Rank (14, MPI_COMM_WORLD): Call stack within LAM:
> Rank (14, MPI_COMM_WORLD): - MPI_Recv()
> Rank (14, MPI_COMM_WORLD): - main()
> MPI_Recv: unclassified: Cannot allocate memory (rank 1, comm 4)
> Rank (13, MPI_COMM_WORLD): Call stack within LAM:
> Rank (13, MPI_COMM_WORLD): - MPI_Recv()
> Rank (13, MPI_COMM_WORLD): - main()
> -----------------------------------------------------------------------------
>
> One of the processes started by mpirun has exited with a nonzero exit
> code. This typically indicates that the process finished in error.
> If your process did not finish in error, be sure to include a "return
> 0" or "exit(0)" in your C code before exiting the application.
>
> PID 5096 failed on node n4 with exit status 1.
> -----------------------------------------------------------------------------
> ---snip---
>
> "Cannot allocate memory"is obvious. But on a node with 2GB RAM, no other
> procs, and a local matrix size of 142MB? LAM version is 6.5.9, taken from
> SuSE Linux 8.2. The failing program is used for parallel matrix setup and
> decomposition using BLACS and ScaLAPACK on a 16-node P4 cluster system.
>
> Any comments appreciated.
The error in being thrown because LAM is unable to complete a malloc()
inside. Since you are dealing with very large messages and data
structures, memory is being used rapidly and that is why you are getting
the error.
I suggest that you should try to check the program using memory-checking
debuggers (valgrind/bcheck) for obvious problems. Also note that you need
to recompile LAM --with-purify.
Hope this helps.
---------------
Amey Dharurkar
Graduate Student, Indiana Univeristy.
>
> Regards,
> Michael
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
|