On Tue, 30 Aug 2005, Pierre Valiron wrote:
> The lamboot agent failed to open a client socket to the newly-booted
> process at IP address 192.168.11.11, port 33760.
>From what I understand from the code, this shows an error within
lamboot phase. I find it strange that you don't get this error without
'mpirun -s', as this condition should have nothing to do with copying
the executable by mpirun.
I don't have any smart solution, but you can use the advice that I got
from Jeff when I was struggling to get LAM/MPI running under SGE: use
'lamboot -d' to get some debugging messages - although if you hit some
kind of race (which is likely), the mere printing of debugging
messages might make the problem go away...
Another idea is to run 'mpirun -v -sa -p bla ...' to get some more
details about what node fails to start.
And yet another idea, but I only took a look at the LAM 7.0.3 code, I
hope that it is still valid for later ones: before running lamboot,
define in the environment ((t)csh syntax):
setenv LAM_MPI_SSI_boot_base_promisc 1
--
Bogdan Costescu
IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu_at_[hidden]
|