On Feb 16, 2006, at 5:18 PM, Robert fiske wrote:
> I managed to get the program to launch, however after starting the
> following
> error is produced (this particular run didn't show any output from the
> program but the error remains the same) Is this still a LAM issue, or
> should I move over to the nwchem list for help?
>
> Thank you for your time and assistance
>
> Robert Fiske
>
>
> ARMCI configured for 2 cluster nodes. Network protocol is 'TCP/IP
> Sockets'.
> trying to connect to host=Mercury, port=49366
> 0:armci_CreateSocketAndConnect: gethostbyname failed: 0
> 0:armci_CreateSocketAndConnect: gethostbyname failed: 0
> Last System Error Message from Task 0:: Invalid argument
> -10000(s):armci_data_serv: unknown format code: (0,0)
> -10000(s):armci_data_serv: unknown format code: (0,0)
> Last System Error Message from Task -10000:: Invalid argument
None of these messages are from LAM -- is your application trying to
use TCP sockets for something? It looks like it's trying to and
failing (perhaps an incorrect hostname somewhere? It indicates that
gethostbyname is failing, meaning that it was trying to look up a
hostname and failed).
> ----------------------------------------------------------------------
> ---
> One of the processes started by mpirun has exited with a nonzero exit
> code. This typically indicates that the process finished in error.
> If your process did not finish in error, be sure to include a "return
> 0" or "exit(0)" in your C code before exiting the application.
>
> PID 424 failed on node n1 (192.168.5.14) with exit status 1.
> ----------------------------------------------------------------------
> ---
This is the first message from LAM. However, it's somewhat odd.
This message means the processes all invoked MPI_INIT, which
surprised me -- based on the messages you showed above, it looked
like your application was trying to use TCP sockets for communication
(as opposed to MPI).
But this message is only shown by LAM if the processes in your
application invoked MPI_INIT and then failed to call MPI_FINALIZE
before quitting.
In conclusion: I think your application is failing for the reasons
that it showed above, but this is not a LAM problem -- gethostbyname
is failing while your application is trying to setup TCP sockets for
some other reason.
--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/
|