Even stranger...
If instead of booting 30 nodes I limit to a mere 2 node, I can't use
mpirun -s at all:
valiron_at_n11 ~ > lamboot bhost
LAM 7.1.2b25/MPI 2 C++/ROMIO - Indiana University
valiron_at_n11 ~ > lamnodes
n0 n11:1:origin,this_node
n1 n12:1:
valiron_at_n11 ~ > mpirun -s n11 C rotate
mpirun: cannot start rotate on n0 (o): invalid node
Instead if I issue
mpirun C rotate
it runs perfectly.
Any idea welcome !
Pierre.
Bogdan Costescu wrote:
>On Tue, 30 Aug 2005, Pierre Valiron wrote:
>
>
>
>>The lamboot agent failed to open a client socket to the newly-booted
>>process at IP address 192.168.11.11, port 33760.
>>
>>
>
>>From what I understand from the code, this shows an error within
>lamboot phase. I find it strange that you don't get this error without
>'mpirun -s', as this condition should have nothing to do with copying
>the executable by mpirun.
>
>I don't have any smart solution, but you can use the advice that I got
>from Jeff when I was struggling to get LAM/MPI running under SGE: use
>'lamboot -d' to get some debugging messages - although if you hit some
>kind of race (which is likely), the mere printing of debugging
>messages might make the problem go away...
>
>Another idea is to run 'mpirun -v -sa -p bla ...' to get some more
>details about what node fails to start.
>
>And yet another idea, but I only took a look at the LAM 7.0.3 code, I
>hope that it is still valid for later ones: before running lamboot,
>define in the environment ((t)csh syntax):
>
>setenv LAM_MPI_SSI_boot_base_promisc 1
>
>
>
--
Soutenez le mouvement SAUVONS LA RECHERCHE :
http://recherche-en-danger.apinc.org/
_/_/_/_/ _/ _/ Dr. Pierre VALIRON
_/ _/ _/ _/ Laboratoire d'Astrophysique
_/ _/ _/ _/ Observatoire de Grenoble / UJF
_/_/_/_/ _/ _/ BP 53 F-38041 Grenoble Cedex 9 (France)
_/ _/ _/ http://www-laog.obs.ujf-grenoble.fr/~valiron/
_/ _/ _/ Mail: Pierre.Valiron_at_[hidden]
_/ _/ _/ Phone: +33 4 7651 4787 Fax: +33 4 7644 8821
_/ _/_/
|