I have problems with mpiJava (that use MPICH2 implementation).
 

Packing pixels of images with 11MB, the same program works very well, but using pixels of images with 21MB, it doesn't work correctly. 

With 3 pcs and 3 processes, it's Ok! :) But with 3 pcs and 4 processes, occur the followings errors:

 

mpirun -np 4 java -Xmx300M Med21MB3x3

 

[cli_2]: aborting job:

Fatal error in MPI_Recv: Other MPI error, error stack:

MPI_Recv(186).............................: MPI_Recv(buf=0xb0c35008, count=14617628, MPI_BYTE, src=0, tag=902, MPI_COMM_WORLD, status=0x876fa50) failed

MPIDI_CH3_Progress_wait(212)..............: an error occurred while handling an event returned by MPIDU_Sock_Wait()

MPIDI_CH3I_Progress_handle_sock_event(413):

MPIDU_Socki_handle_read(633)..............: connection failure (set=0,sock=2,errno=104:Connection reset by peer)

[cli_1]: aborting job:

Fatal error in MPI_Recv: Other MPI error, error stack:

MPI_Recv(186).............................: MPI_Recv(buf=0xb0c1d008, count=14606208, MPI_BYTE, src=0, tag=901, MPI_COMM_WORLD, status=0x9a81818) failed

MPIDI_CH3_Progress_wait(212)..............: an error occurred while handling an event returned by MPIDU_Sock_Wait()

MPIDI_CH3I_Progress_handle_sock_event(413):

MPIDU_Socki_handle_read(633)..............: connection failure (set=0,sock=1,errno=104:Connection reset by peer)

rank 2 in job 10  lab07_15_33967   caused collective abort of all ranks

  exit status of rank 2: return code 1

rank 1 in job 10  lab07_15_33967   caused collective abort of all ranks

  exit status of rank 1: return code 1

rank 0 in job 10  lab07_15_33967   caused collective abort of all ranks

  exit status of rank 0: killed by signal 9

 

what's wrong?

 

Thanks,

 

Priscila.


 



--
(>'''''<)  
(  ' ; ' )  
(@)(@)  Prí