Hi,
How can i verify that I am running the same application on all
nodes?
HOw to give the full path, just give me one example.
Thanks.
Dilani.
> On Dec 9, 2005, at 3:43 PM, Dilani Perera wrote:
>
>> 20071 out running on n15
>> ----------------------------------------------------------------------
>> ----
>> It seems that rank 15 was not able to open a TCP client socket for
>> some reason. LAM is likely to abort your program shortly. :-(
>>
>> Perhaps this unix error message will help:
>>
>> Unix errno: 111
>> Connection refused
>> ----------------------------------------------------------------------
>> ---
>
> I think most of the output here is a red herring.
>
>> nuthatch% ---------------------------------------
>> <
>> /bin/ksh:
>> ----------------------------------------------------------------------
>> ---:
>> not found
>> nuthatch% MPI_Recv: process in local group is dead (rank 4,
>> MPI_COMM_WORLD)
>> /bin/ksh: syntax error: `(' unexpected
>
> This message looks quite fishy to me. Why is ksh reporting an error
> here?
>
> Can you verify that you're running the same application on all
> nodes? It looks like mpriun found a shell script on at least some
> nodes and tried to treat it like an MPI application (that's a wild
> guess). Can you try running with the absolute path name of your
> executable? (I'm assuming you have a shared filesystem visible to
> all nodes)
>
> Something like:
>
> mpirun -np 16 `pwd`/my_application
>
> --
> {+} Jeff Squyres
> {+} The Open MPI Project
> {+} http://www.open-mpi.org/
>
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
Dilani Perera.
(MSC Candidate for Computational Sciences)
Department of Computer Science,
St. John's, NL
Canada,A1B 3X5
Tel: 709-737-6142 (office)
email : dilani_at_[hidden]
Visit me at : www.cs.mun.ca/~dilani
|