Sorry for the delay in responding. We have been trying to figure out
exactly why this error would occur.
The error messages is emanating from the the usysv rpi module
<share/ssi/rpi/usysv/src/ssi_rpi_usysv_common.c>, specifically the
following chunk of code <shmlock()>:
--------------------------
do {
if (semop(semaphores, &shm_lock, 1) == 0) {
return;
} else if (errno != EINTR) {
lam_err_comm(MPI_COMM_NULL, MPI_ERR_OTHER, errno,
"locking shared memory area");
}
} while (1);
--------------------------
It seems that the semaphore operation failed with an error while trying
to lock the shared memory region. :(
Were system resources max'ed out when the error occurred (too many
running process, lots of active memory use, etc.)? Have you been able
to reproduce it since? What operating environment are you using (OS,
Hardware, Software, LAM/MPI version, etc.)?
As a short term solution, you could try not using the usysv rpi module
and see if that helps. The sysv rpi module is a good substitute.
Josh
On Mar 16, 2005, at 4:21 PM, William Brian Lane wrote:
>
> Hello. I was running the ScaLAPACK subroutine pdstebz on a 3x3 grid
> and
> received the following message:
>
> MPI_Recv: unclassified: Identifier removed: locking shared memory area
> (rank 0, MPI_COMM_WORLD)
> Rank (0, MPI_COMM_WORLD): Call stack within LAM:
> Rank (0, MPI_COMM_WORLD): - MPI_Recv()
> Rank (0, MPI_COMM_WORLD): - MPI_Reduce()
> Rank (0, MPI_COMM_WORLD): - MPI_Allreduce()
> Rank (0, MPI_COMM_WORLD): - MPI_Comm_create()
> Rank (0, MPI_COMM_WORLD): - main()
> -----------------------------------------------------------------------
> ------
> One of the processes started by mpirun has exited with a nonzero exit
> code. This typically indicates that the process finished in error.
> If your process did not finish in error, be sure to include a "return
> 0" or "exit(0)" in your C code before exiting the application.
>
> PID 29616 failed on node n0 (128.227.64.23) with exit status 1.
> -----------------------------------------------------------------------
> ------
>
>
> I was wondering if anyone could tell me what it means and how I might
> be
> able to fix it. Thanks!
>
>
> Take care,
>
> Brian Lane
> blane_at_[hidden]
> http://plaza.ufl.edu/blane116
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
----
Josh Hursey
jjhursey_at_[hidden]
http://www.lam-mpi.org/
|