On Mar 10, 2005, at 2:43 PM, Srinivasa Prade Patri wrote:
> Iam trying to run a small example program using scalapack
> routines. This example is from tutorials on lam site. The program does
> a matrix vector mulitplication. Iam able to compile the code but not
> able to run it.
>
> If i run it as
> mpirun -v -np 4 mv_acml
>
> iam getting the following error message...
> 1669 mv_acml running on n0
> 1413 mv_acml running on n1 (o)
> 1304 mv_acml running on n2
> 1288 mv_acml running on n3
> -----------------------------------------------------------------------
> ------
> One of the processes started by mpirun has exited with a nonzero exit
> code. This typically indicates that the process finished in error.
> If your process did not finish in error, be sure to include a "return
> 0" or "exit(0)" in your C code before exiting the application.
>
> PID 1669 failed on node n0 (10.5.0.1) due to signal 11.
> -----------------------------------------------------------------------
> ------
>
> i guess signal 11 is a segmentation fault.Iam not sure why iam
> getting this.
I don't know Fortran very well, so I can't help much in debugging your
program. But signal 11 is usually a segmentation fault. This
indicates that you have tried to use memory that you don't have rights
to. Generally, this means that you incorrectly used a memory buffer.
Since you are using Scalapack, which has been fairly well tested over
the years, I would guess that the problem is an incorrect usage of the
Scalapack interface. I would recommend using a debugger (see the LAM
FAQ for information on debugging MPI applications) to find out where
the segfault is occurring. It may come from Scalapack or MPI, in which
case you will have to trace back to your application to determine why
either library used a memory address it didn't think it had rights to.
Hope this helps,
Brian
--
Brian Barrett
LAM/MPI developer and all around nice guy
Have an LAM/MPI day: http://www.lam-mpi.org/
|