On May 19, 2006, at 11:23 AM, Barry McInnes wrote:
> we are on a 50 node G5 (Mac X 10.4.6) cluster, and had used a b30
> lammpi
> for a while. We recently tried b34 (both made from source), and still
> get the same errors in lamtest just for tcp.
> It passes most tests but then starts failing with
>
> PASS: client_server
> mpirun -x TEST -ssi cr none -s h C -ssi rpi crtcp
> /Users/bjm/lamtests-7.1.2b34/dynamic/./comm_join
> [**ERROR**]: LAM/MPI MPI_COMM_WORLD rank 1, file comm_join.c:143:
> ERROR: Client could not connect() properly
> ----------------------------------------------------------------------
> -------
> One of the processes started by mpirun has exited with a nonzero exit
> code. This typically indicates that the process finished in error.
> If your process did not finish in error, be sure to include a "return
> 0" or "exit(0)" in your C code before exiting the application.
>
> Are these fixable, or do we ignore them ?
I think we tracked this error down recently, and it was localized to
the MPI-2 dynamic process functions (accept, connect, and join, to be
specific). If you don't need these functions, it is safe to ignore
the errors. If you do need those functions, I'm releasing a LAM
7.1.3 beta this weekend that should have a fix for this issue.
Brian
--
Brian Barrett
LAM/MPI developer and all around nice guy
Have a LAM/MPI day: http://www.lam-mpi.org/
|