Thanks I did that. I also changed a lot of other things too. It is working
now. I think one of the big problems was that I had one of the parameters
missing in MPI_Send. Duh,
Sam Adams
General Dynamics - Network Systems
Phone: 210.536.5945
-----Original Message-----
From: lam-bounces_at_[hidden] [mailto:lam-bounces_at_[hidden]] On Behalf Of
Jeff Squyres (jsquyres)
Sent: Friday, May 26, 2006 7:41 AM
To: General LAM/MPI mailing list
Subject: Re: LAM: MPI_Recv getting garbage
I think that your mpiStatus is not large enough -- it is supposed to be
an integer array of size MPI_STATUS_SIZE.
Try changing that and see if that resolves your problem.
> -----Original Message-----
> From: lam-bounces_at_[hidden]
> [mailto:lam-bounces_at_[hidden]] On Behalf Of Adams Samuel D
> Contr AFRL/HEDR
> Sent: Tuesday, May 23, 2006 3:16 PM
> To: 'General LAM/MPI mailing list'
> Subject: LAM: MPI_Recv getting garbage
>
> For some reason I am some weird values when I am using MPI_Recv. To
> simplify the debugging I just send 10 MPI_REALs from 1.0 to
> 10.0. The first
> one comes in as 0.0, the next one comes as garbage, and the rest are
> correct. Let me first mention that I am not really a regular fortran
> programmer, so there could be easily something I am doing
> wrong with my
> fortran, but I don't understand why I am getting crap on my
> MPI_Recv calls.
> I am not sure if this is a MPI problem, or the more likely
> case of a problem
> with my code. It seems like this should be really simple.
>
> ---------------------------code section that is giving
> problems-------------
> subroutine getArbPulseArray()
> use ps_parameters
> use commona
> use aitoc
> implicit none
>
> integer :: pulseStatus, mpiStatus, i
> real, dimension(10) :: r_arr
> real :: real1, real2, real3
>
> write(*,*)"checking for a pulse... array that is!"
> call MPI_Recv(pulseStatus, 1, MPI_INTEGER, 0, 0, MPI_COMM_WORLD,
> mpiStatus, ierr)
> write(*,*)" -0 got ", pulseStatus, " my rank ", my_rank
> if(pulseStatus.eq.-2) then
> write(*,*)" -no pulse found! Abort!"
> call MPI_Abort(MPI_COMM_WORLD, ierr)
> else if(pulseStatus.eq.-1) then
> write(*,*)" -not an arbitrary pulse problem"
> else if(pulseStatus.eq.0) then
> write(*,*)" -getting pulse array"
> call MPI_Recv(real1, 1, MPI_REAL, 0, 0, MPI_COMM_WORLD,
> mpiStatus,
> ierr)
> write(*,*)" -1 got ", real1, " my rank ", my_rank, "
> ierror = ",
> ierr
> call MPI_Recv(real2, 1, MPI_REAL, 0, 0, MPI_COMM_WORLD,
> mpiStatus,
> ierr)
> write(*,*)" -2 got ",real2, " my rank ", my_rank, "
> ierror = ", ierr
> call MPI_Recv(real3, 1, MPI_REAL, 0, 0, MPI_COMM_WORLD,
> mpiStatus,
> ierr)
> ------------------the sender
> guy-------------------------------------------
> subroutine readArbPulseFile(fileName)
> use ps_parameters
> use commona
> use aitoc
> implicit none
>
> character*40 :: fileName
> integer :: ioUnit = 101
> integer :: returnStatus, i
> real, dimension(10) :: r_arr = (/ (i, i=1,10) /)
> open(unit=ioUnit, file=fileName, status="old", iostat=returnStatus,
> form="formatted", action="read")
> if(returnStatus.ne.0) then
> write(*,*)"error: could not open file ", fileName, " (error ",
> returnStatus, ")"
> write(*,*)" -killing processors"
> call sendInt(-2)
> write(*,*)" -aborting"
> call MPI_Abort(MPI_COMM_WORLD, ierr)
> end if
> call sendInt(0)
> do i = 1, 10
> call sendReal(r_arr(i))
> end do
> write(*,*)"everything was good with the root."
> end subroutine
> ------------------output--------------------------------------
--------------
> Script started on Tue 23 May 2006 03:57:26 PM CDT
> ]0;jnorred_at_cooper:~/fdtd/test_files
> mpirun -np 3 ../fdtd <modelsphere.dat -ifile ps.dat -tfile
> tissue.txt -air
> 20 -ps
>
> variable declaration complete, calling init_permit_calc
> init_permit_calc complete, initializing MPI layer
> variable declaration complete, calling init_permit_calc
> init_permit_calc complete, initializing MPI layer
> variable declaration complete, calling init_permit_calc
> init_permit_calc complete, initializing MPI layer
> MPI Layer initialized, processing command line parameters
>
> Running as point source
> Program continues WITHOUT master-node calculating.
>
> command line parameters processed, calling readparams
> checking for a pulse... array that is!
> checking for a pulse... array that is!
> Log files prefix will be Sphere101ABC10air
> --sending 0 to 1
> --sending 0 to 2
> -0 got 0 my rank 1
> -getting pulse array
> -0 got 0 my rank 2
> -getting pulse array
> --sending 1.00000000 to 1
> --sending 1.00000000 to 2
> -1 got 0.00000000E+00 my rank 1 ierror = 0
> --sending 2.00000000 to 1
> --sending 2.00000000 to 2
> -1 got 0.00000000E+00 my rank 2 ierror = 0
> --sending 3.00000000 to 1
> -2 got 5.60519386E-45 my rank 1 ierror = 0
> -2 got 5.60519386E-45 my rank 2 ierror = 0
> --sending 3.00000000 to 2
> --sending 4.00000000 to 1
> -3 got 3.00000000 my rank 1 ierror = 0
> -3 got 3.00000000 my rank 2 ierror = 0
> --sending 4.00000000 to 2
> -4 got 4.00000000 my rank 1 ierror = 0
> --sending 5.00000000 to 1
> --sending 5.00000000 to 2
> -5 got 5.00000000 my rank 1 ierror = 0
> --sending 6.00000000 to 1
> -4 got 4.00000000 my rank 2 ierror = 0
> --sending 6.00000000 to 2
> -5 got 5.00000000 my rank 2 ierror = 0
> -6 got 6.00000000 my rank 1 ierror = 0
> -6 got 6.00000000 my rank 2 ierror = 0
> --sending 7.00000000 to 1
> --sending 7.00000000 to 2
> -7 got 7.00000000 my rank 1 ierror = 0
> --sending 8.00000000 to 1
> -7 got 7.00000000 my rank 2 ierror = 0
> --sending 8.00000000 to 2
> -8 got 8.00000000 my rank 1 ierror = 0
> --sending 9.00000000 to 1
> --sending 9.00000000 to 2
> -8 got 8.00000000 my rank 2 ierror = 0
> --sending 10.0000000 to 1
> -9 got 9.00000000 my rank 2 ierror = 0
> --sending 10.0000000 to 2
> everything was good with the root.
> -9 got 9.00000000 my rank 1 ierror = 0
> -10 got 10.0000000 my rank 2 ierror = 0
> -10 got 10.0000000 my rank 1 ierror = 0
> everything was good with the node 1
> everything was good with the node 2
> jwe0019i-u The program was terminated abnormally with signal
> number SIGSEGV.
>
> error summary (Fortran)
> error number error level error count
> jwe0019i u 1
> total error count = 1
> MPI_Recv: process in local group is dead (rank 1, MPI_COMM_WORLD)
> MPI_Recv: process in local group is dead (rank 2, MPI_COMM_WORLD)
> Rank (1, MPI_COMM_WORLD): Call stack within LAM:
> Rank (2, MPI_COMM_WORLD): Call stack within LAM:
> Rank (1, MPI_COMM_WORLD): - MPI_Recv()
> Rank (2, MPI_COMM_WORLD): - MPI_Recv()
> Rank (1, MPI_COMM_WORLD): - main()
> Rank (2, MPI_COMM_WORLD): - main()
> --------------------------------------------------------------
> --------------
> One of the processes started by mpirun has exited with a nonzero exit
> code. This typically indicates that the process finished in error.
> If your process did not finish in error, be sure to include a "return
> 0" or "exit(0)" in your C code before exiting the application.
>
> PID 8581 failed on node n0 (127.0.0.1) with exit status 240.
> --------------------------------------------------------------
> --------------
> ]0;jnorred_at_cooper:~/fdtd/test_files
> [jnorred_at_cooper test_files]$ exit
> exit
>
> Script done on Tue 23 May 2006 03:57:39 PM CDT
>
> Sam Adams
> General Dynamics - Network Systems
> Phone: 210.536.5945
>
>
>
_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/
|