LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Priscila Saito (priscilasaito_at_[hidden])
Date: 2007-07-25 14:44:56


I have problems with mpiJava (that use MPICH2 implementation).

Packing pixels of images with 11MB, the same program works very well, but
using pixels of images with 21MB, it doesn't work correctly.
With 3 pcs and 3 processes, it's Ok! :) But with 3 pcs and 4 processes,
occur the followings errors:

 mpirun -np 4 java -Xmx300M Med21MB3x3

[cli_2]: aborting job:

Fatal error in MPI_Recv: Other MPI error, error stack:

MPI_Recv(186).............................: MPI_Recv(buf=0xb0c35008,
count=14617628, MPI_BYTE, src=0, tag=902, MPI_COMM_WORLD, status=0x876fa50)
failed

MPIDI_CH3_Progress_wait(212)..............: an error occurred while handling
an event returned by MPIDU_Sock_Wait()

MPIDI_CH3I_Progress_handle_sock_event(413):

MPIDU_Socki_handle_read(633)..............: connection failure
(set=0,sock=2,errno=104:Connection reset by peer)

[cli_1]: aborting job:

Fatal error in MPI_Recv: Other MPI error, error stack:

MPI_Recv(186).............................: MPI_Recv(buf=0xb0c1d008,
count=14606208, MPI_BYTE, src=0, tag=901, MPI_COMM_WORLD, status=0x9a81818)
failed

MPIDI_CH3_Progress_wait(212)..............: an error occurred while handling
an event returned by MPIDU_Sock_Wait()

MPIDI_CH3I_Progress_handle_sock_event(413):

MPIDU_Socki_handle_read(633)..............: connection failure
(set=0,sock=1,errno=104:Connection reset by peer)

rank 2 in job 10 lab07_15_33967 caused collective abort of all ranks

  exit status of rank 2: return code 1

rank 1 in job 10 lab07_15_33967 caused collective abort of all ranks

  exit status of rank 1: return code 1

rank 0 in job 10 lab07_15_33967 caused collective abort of all ranks

  exit status of rank 0: killed by signal 9

what's wrong?

Thanks,

Priscila.

-- 
(>'''''<)
(  ' ; ' )
(@)(@)  Prí