I am doing ARMCI with lammpi-7.1.1 on two processor. When I check
test.x on one processor, it works fine but when i check it on more
than two computers, then i get
the following error:
mpirun -np 2 -v ./test.x
7605 ./test.x running on n0 (o)
3168 ./test.x running on n1
ARMCI configured for 2 cluster nodes. Network protocol is 'TCP/IP Sockets'.
1:trying connect to host=peeyush, port=37912 t=5 111
trying to connect:: Connection refused
1:armci_CreateSocketAndConnect: connect failed: -1
Last System Error Message from Task 1:: Connection refused
1:armci_CreateSocketAndConnect: connect failed: -1
0:trying connect to host=localhost, port=32911 t=5 111
trying to connect:: Connection refused
0:armci_CreateSocketAndConnect: connect failed: -1
Last System Error Message from Task 0:: Connection refused
0:armci_CreateSocketAndConnect: connect failed: -1
-10001(s):armci_AcceptSockAll:timeout waiting for connection: 0
-10001(s):armci_AcceptSockAll:timeout waiting for connection: 0
-10000(s):armci_AcceptSockAll:timeout waiting for connection: 0
-10000(s):armci_AcceptSockAll:timeout waiting for connection: 0
-----------------------------------------------------------------------------
One of the processes started by mpirun has exited with a nonzero exit
code. This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.
PID 7605 failed on node n0 (172.26.117.167) with exit status 1.
-----------------------------------------------------------------------------
I have configure my lam mpi with ifort and gcc compiler. and armci with
mpif77 and mpicc. Can anyone of you please tell me what the problem is
with my TCP/IP connection.
Peeyush
|