LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Luiz Angelo Barchet Steffenel (Luiz-Angelo.Estefanel_at_[hidden])
Date: 2005-03-08 08:38:14


Hello folks,

I'm trying to simulate an heterogeneous grid on top of a cluster, so I
intend to use IMPI to connect different "logical clusters", each one
with a different number of machines and network (ssi) support.

There it comes my weird experience: at this moment, I'm able to launch
the IMPI server and the clients ONLY IF there's a single machine at each
client - the number of processes (-np) doesn't matter. However, when
lamboot initialises many nodes, the impirun call seems to not connect to
the IMPI server anymore, waiting forever...

Below you will find two scenarios, one that works, where each "cluster"
has only one machine, and another scenario that doesn't work, where
lamboot initialises many machines into each cluster. I also tried a 1xN
scenario, and in that case only the "1 machine" side connects to the
IMPI server.

I think it's not a problem of connection, because all machines belong to
the same cluster. I took care to launch the IMPI server on a machine
that is not used in lamboot.

Do you have an idea on how to solve this problem? I'm using LAM from the
last tarball, compiled with IMPI support as well as IMPI server version
1.3. The cluster is composed of Itanium-2 machines, with Linux kernel
2.4.21SMP ia64.

Thank you in advance,

Luiz Angelo Steffenel

*In practice: *
*
_WORKING_*

[estefane_at_ita1]$ lamboot -v
LAM 7.2b1svn03082005/IMPI/ROMIO - Indiana University
n-1<20870> ssi:boot:base:linear: booting n0 (ita1)
[estefane_at_ita1]$ impirun -client 0 xxx.xx.xx.xxx:5555 N hellompi
WARNING: IMPI server requested IMPI_AUTH_NONE authentication protocol
My rank is 0
----------------------------------------------------------------------------
[estefane_at_ita2]$ lamboot -v
LAM 7.2b1svn03082005/IMPI/ROMIO - Indiana University
n-1<20870> ssi:boot:base:linear: booting n0 (ita2)
[estefane_at_ita2]$ impirun -client 0 xxx.xx.xx.xxx:5555 N hellompi
WARNING: IMPI server requested IMPI_AUTH_NONE authentication protocol
My rank is 1
----------------------------------------------------------------------------
[estefane_at_ita100]$ ./impi-server -server 2 -auth 0 -p 5555
xxx.xx.xx.xxx:5555
WARNING: Client from xxx.xx.xx.x has connected with IMPI_AUTH_NONE
WARNING: Client from xxx.xx.xx.x has connected with IMPI_AUTH_NONE

_*NOT WORKING*_

[estefane_at_ita1]$ lamboot -v $OAR_FILE_NODES
LAM 7.2b1svn03082005/IMPI/ROMIO - Indiana University
n-1<20870> ssi:boot:base:linear: booting n0 (ita1)
n-1<20870> ssi:boot:base:linear: booting n1 (ita10)
n-1<20870> ssi:boot:base:linear: booting n2 (ita11)
n-1<20870> ssi:boot:base:linear: booting n3 (ita12)
n-1<20870> ssi:boot:base:linear: booting n4 (ita13)
n-1<20870> ssi:boot:base:linear: booting n5 (ita14)
n-1<20870> ssi:boot:base:linear: booting n6 (ita15)
n-1<20870> ssi:boot:base:linear: booting n7 (ita16)
n-1<20870> ssi:boot:base:linear: booting n8 (ita17)
n-1<20870> ssi:boot:base:linear: booting n9 (ita18)
n-1<20870> ssi:boot:base:linear: booting n10 (ita19)
n-1<20870> ssi:boot:base:linear: finished
[estefane_at_ita1 impi_server-1.3]$ impirun -client 0 xxx.xx.xx.xx:5555 N
hellompi
----------------------------------------------------------------------------
[estefane_at_ita2 impi_server-1.3]$ lamboot -v $OAR_FILE_NODES
LAM 7.2b1svn03082005/IMPI/ROMIO - Indiana University
n-1<20870> ssi:boot:base:linear: booting n0 (ita2)
n-1<20870> ssi:boot:base:linear: booting n1 (ita20)
n-1<20870> ssi:boot:base:linear: booting n2 (ita21)
n-1<20870> ssi:boot:base:linear: booting n3 (ita22)
n-1<20870> ssi:boot:base:linear: booting n4 (ita23)
n-1<20870> ssi:boot:base:linear: booting n5 (ita24)
n-1<20870> ssi:boot:base:linear: booting n6 (ita25)
n-1<20870> ssi:boot:base:linear: booting n7 (ita26)
n-1<20870> ssi:boot:base:linear: booting n8 (ita27)
n-1<20870> ssi:boot:base:linear: booting n9 (ita28)
n-1<20870> ssi:boot:base:linear: booting n10 (ita29)
n-1<20870> ssi:boot:base:linear: booting n11 (ita3)
n-1<20870> ssi:boot:base:linear: finished
[estefane_at_ita2]$ impirun -client 0 xxx.xx.xx.xxx:5555 N hellompi
----------------------------------------------------------------------------
[estefane_at_ita100]$ ./impi-server -server 2 -auth 0 -p 5555 -v
./impi-server -server 2 -auth 0 -p 5555 -v
server_auths[0] = 0
IMPI server version 0 started on host ita100.xxxx.xx
IMPI server listening on port 5555 for 2 connection(s).
xxx.xx.xx.xxx:5555
IMPI server: Entering main server loop.