Hi,
under which circumstances does LAM throw an error message like this:
---snip---
Frequency Step Number 22
Frequency Step Number 23
Frequency Step Number 24
MPI_Recv: unclassified: Cannot allocate memory (rank 3, comm 4)
Rank (7, MPI_COMM_WORLD): Call stack within LAM:
Rank (7, MPI_COMM_WORLD): - MPI_Recv()
Rank (7, MPI_COMM_WORLD): - main()
MPI_Recv: unclassified: Cannot allocate memory (rank 3, comm 4)
Rank (11, MPI_COMM_WORLD): Call stack within LAM:
Rank (11, MPI_COMM_WORLD): - MPI_Recv()
Rank (11, MPI_COMM_WORLD): - main()
MPI_Recv: unclassified: Cannot allocate memory (rank 2, comm 4)
Rank (14, MPI_COMM_WORLD): Call stack within LAM:
Rank (14, MPI_COMM_WORLD): - MPI_Recv()
Rank (14, MPI_COMM_WORLD): - main()
MPI_Recv: unclassified: Cannot allocate memory (rank 1, comm 4)
Rank (13, MPI_COMM_WORLD): Call stack within LAM:
Rank (13, MPI_COMM_WORLD): - MPI_Recv()
Rank (13, MPI_COMM_WORLD): - main()
-----------------------------------------------------------------------------
One of the processes started by mpirun has exited with a nonzero exit
code. This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.
PID 5096 failed on node n4 with exit status 1.
-----------------------------------------------------------------------------
---snip---
"Cannot allocate memory"is obvious. But on a node with 2GB RAM, no other
procs, and a local matrix size of 142MB? LAM version is 6.5.9, taken from
SuSE Linux 8.2. The failing program is used for parallel matrix setup and
decomposition using BLACS and ScaLAPACK on a 16-node P4 cluster system.
Any comments appreciated.
Regards,
Michael
|