LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Adams Samuel D Contr AFRL/HEDR (Samuel.Adams.ctr_at_[hidden])
Date: 2006-05-23 15:15:36


For some reason I am some weird values when I am using MPI_Recv. To
simplify the debugging I just send 10 MPI_REALs from 1.0 to 10.0. The first
one comes in as 0.0, the next one comes as garbage, and the rest are
correct. Let me first mention that I am not really a regular fortran
programmer, so there could be easily something I am doing wrong with my
fortran, but I don't understand why I am getting crap on my MPI_Recv calls.
I am not sure if this is a MPI problem, or the more likely case of a problem
with my code. It seems like this should be really simple.

---------------------------code section that is giving problems-------------
subroutine getArbPulseArray()
   use ps_parameters
   use commona
   use aitoc
   implicit none
   
   integer :: pulseStatus, mpiStatus, i
   real, dimension(10) :: r_arr
   real :: real1, real2, real3

   write(*,*)"checking for a pulse... array that is!"
   call MPI_Recv(pulseStatus, 1, MPI_INTEGER, 0, 0, MPI_COMM_WORLD,
mpiStatus, ierr)
   write(*,*)" -0 got ", pulseStatus, " my rank ", my_rank
   if(pulseStatus.eq.-2) then
      write(*,*)" -no pulse found! Abort!"
      call MPI_Abort(MPI_COMM_WORLD, ierr)
   else if(pulseStatus.eq.-1) then
      write(*,*)" -not an arbitrary pulse problem"
   else if(pulseStatus.eq.0) then
      write(*,*)" -getting pulse array"
      call MPI_Recv(real1, 1, MPI_REAL, 0, 0, MPI_COMM_WORLD, mpiStatus,
ierr)
      write(*,*)" -1 got ", real1, " my rank ", my_rank, " ierror = ",
ierr
      call MPI_Recv(real2, 1, MPI_REAL, 0, 0, MPI_COMM_WORLD, mpiStatus,
ierr)
      write(*,*)" -2 got ",real2, " my rank ", my_rank, " ierror = ", ierr
      call MPI_Recv(real3, 1, MPI_REAL, 0, 0, MPI_COMM_WORLD, mpiStatus,
ierr)
------------------the sender guy-------------------------------------------
subroutine readArbPulseFile(fileName)
   use ps_parameters
   use commona
   use aitoc
   implicit none

   character*40 :: fileName
   integer :: ioUnit = 101
   integer :: returnStatus, i
   real, dimension(10) :: r_arr = (/ (i, i=1,10) /)
   open(unit=ioUnit, file=fileName, status="old", iostat=returnStatus,
form="formatted", action="read")
   if(returnStatus.ne.0) then
      write(*,*)"error: could not open file ", fileName, " (error ",
returnStatus, ")"
      write(*,*)" -killing processors"
      call sendInt(-2)
      write(*,*)" -aborting"
      call MPI_Abort(MPI_COMM_WORLD, ierr)
   end if
   call sendInt(0)
   do i = 1, 10
      call sendReal(r_arr(i))
   end do
   write(*,*)"everything was good with the root."
end subroutine
------------------output----------------------------------------------------
Script started on Tue 23 May 2006 03:57:26 PM CDT
]0;jnorred_at_cooper:~/fdtd/test_files
 mpirun -np 3 ../fdtd <modelsphere.dat -ifile ps.dat -tfile tissue.txt -air
20 -ps

 variable declaration complete, calling init_permit_calc
 init_permit_calc complete, initializing MPI layer
 variable declaration complete, calling init_permit_calc
 init_permit_calc complete, initializing MPI layer
 variable declaration complete, calling init_permit_calc
 init_permit_calc complete, initializing MPI layer
 MPI Layer initialized, processing command line parameters
 
 Running as point source
 Program continues WITHOUT master-node calculating.
 
 command line parameters processed, calling readparams
 checking for a pulse... array that is!
 checking for a pulse... array that is!
 Log files prefix will be Sphere101ABC10air
 --sending 0 to 1
 --sending 0 to 2
    -0 got 0 my rank 1
    -getting pulse array
    -0 got 0 my rank 2
    -getting pulse array
 --sending 1.00000000 to 1
 --sending 1.00000000 to 2
    -1 got 0.00000000E+00 my rank 1 ierror = 0
 --sending 2.00000000 to 1
 --sending 2.00000000 to 2
    -1 got 0.00000000E+00 my rank 2 ierror = 0
 --sending 3.00000000 to 1
    -2 got 5.60519386E-45 my rank 1 ierror = 0
    -2 got 5.60519386E-45 my rank 2 ierror = 0
 --sending 3.00000000 to 2
 --sending 4.00000000 to 1
    -3 got 3.00000000 my rank 1 ierror = 0
    -3 got 3.00000000 my rank 2 ierror = 0
 --sending 4.00000000 to 2
    -4 got 4.00000000 my rank 1 ierror = 0
 --sending 5.00000000 to 1
 --sending 5.00000000 to 2
    -5 got 5.00000000 my rank 1 ierror = 0
 --sending 6.00000000 to 1
    -4 got 4.00000000 my rank 2 ierror = 0
 --sending 6.00000000 to 2
    -5 got 5.00000000 my rank 2 ierror = 0
    -6 got 6.00000000 my rank 1 ierror = 0
    -6 got 6.00000000 my rank 2 ierror = 0
 --sending 7.00000000 to 1
 --sending 7.00000000 to 2
    -7 got 7.00000000 my rank 1 ierror = 0
 --sending 8.00000000 to 1
    -7 got 7.00000000 my rank 2 ierror = 0
 --sending 8.00000000 to 2
    -8 got 8.00000000 my rank 1 ierror = 0
 --sending 9.00000000 to 1
 --sending 9.00000000 to 2
    -8 got 8.00000000 my rank 2 ierror = 0
 --sending 10.0000000 to 1
    -9 got 9.00000000 my rank 2 ierror = 0
 --sending 10.0000000 to 2
 everything was good with the root.
    -9 got 9.00000000 my rank 1 ierror = 0
    -10 got 10.0000000 my rank 2 ierror = 0
    -10 got 10.0000000 my rank 1 ierror = 0
 everything was good with the node 1
 everything was good with the node 2
jwe0019i-u The program was terminated abnormally with signal number SIGSEGV.

error summary (Fortran)
error number error level error count
  jwe0019i u 1
total error count = 1
MPI_Recv: process in local group is dead (rank 1, MPI_COMM_WORLD)
MPI_Recv: process in local group is dead (rank 2, MPI_COMM_WORLD)
Rank (1, MPI_COMM_WORLD): Call stack within LAM:
Rank (2, MPI_COMM_WORLD): Call stack within LAM:
Rank (1, MPI_COMM_WORLD): - MPI_Recv()
Rank (2, MPI_COMM_WORLD): - MPI_Recv()
Rank (1, MPI_COMM_WORLD): - main()
Rank (2, MPI_COMM_WORLD): - main()
----------------------------------------------------------------------------
One of the processes started by mpirun has exited with a nonzero exit
code. This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.

PID 8581 failed on node n0 (127.0.0.1) with exit status 240.
----------------------------------------------------------------------------
]0;jnorred_at_cooper:~/fdtd/test_files
[jnorred_at_cooper test_files]$ exit
exit

Script done on Tue 23 May 2006 03:57:39 PM CDT

Sam Adams
General Dynamics - Network Systems
Phone: 210.536.5945



  • application/octet-stream attachment: debug