LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2005-01-10 14:55:21


Sorry for the delay -- a bunch of mails slipped by me during the
holidays.

There are actually 2 problems here:

1. You can't run 2 IMPI clients from the same host. If that's not in
the documentation, it needs to be. This is the specific problem that
you're seeing -- two IMPI clients have the same IP address, and it ends
up in confusion during internal sanity checks. So split these
impirun's across multiple hosts and you should be ok.

2. I forgot to change one additional constant deep within the IMPI code
that basically hosed collectives across IMPI communicators (including
MPI_COMM_WORLD). I committed that fix to Subversion within the last
hour or two.

Brian also noticed that our web page showing the nightly SVN tarballs
was broken; I fixed that so that the latest is now showing (including
the fixes for #2, above). SVN r10011 on both the trunk and
branches/branch-7-1 should work for you.

Let me know if this works for you.

On Dec 29, 2004, at 9:46 AM, Luiz Angelo Barchet-Steffenel wrote:

> Hi Jeff!
>
> Thank you for your help, this time I successfully compiled LAM with
> IMPI support from the snapshot version. However, It keeps with the
> same error I found on LAM 7.0.4, the "truncation of socket" problem.
>
> The application is a simple HelloWorld, but this error also occurs
> when the application has only MPI_Init and MPI_Finalize. Do you have
> an idea on why this happens, and if this error can be corrected in the
> next versions (in the next snapshot, for example ;) ?
>
> Below you can find the output from the server and the clients. I also
> added the full (-v) output from the server at the end of this email. I
> hope these outputs can help you.
>
>
> Thank you, and have a Happy New Year :)
>
>
> Luiz Angelo Steffenel
>
>
>
> Server:
> ./impi_server -server 2 -auth 0 -port 5555
> xxx.xxx.xxx.xxx:5555
> WARNING: Client from 127.0.0.1 has connected with IMPI_AUTH_NONE
> WARNING: Client from 127.0.0.1 has connected with IMPI_AUTH_NONE
> Read from client -- client socket closed!
> Client 0 has hung up before sending IMPI_CMD_FINI.  How anti-social.
> Read from client -- client socket closed!
> Client 1 has hung up before sending IMPI_CMD_FINI.  How anti-social.
>
> Client 0:
> 
> WARNING: IMPI server requested IMPI_AUTH_NONE authentication protocol
> Remote IMPI host (my index 0) has a different index number for itself
> (1): Success
> MPI_Recv: process in local group is dead (rank 0, IMPID
> intracommunicator)
>
> Client 1:
> impirun -client 1 localhost:5555 N hello
> WARNING: IMPI server requested IMPI_AUTH_NONE authentication protocol
> LAM IMPI client: Unexpected truncation of socket read (IMPI_Packet
> header) 
> Error on IMPI host 0.  Closing connection (good luck).
> LAM IMPI client: Unexpected truncation of socket read (IMPI_Packet
> header)
> LAM IMPI client: Unexpected truncation of socket read (IMPI_Packet
> header)
> LAM IMPI client: Unexpected truncation of socket read (IMPI_Packet
> header)
> LAM IMPI client: Unexpected truncation of socket read (IMPI_Packet
> header)
> LAM IMPI client: Unexpected truncation of socket read (IMPI_Packet
> header)
> LAM IMPI client: Unexpected truncation of socket read (IMPI_Packet
> header)
> LAM IMPI client: Unexpected truncation of socket read (IMPI_Packet
> header)
> LAM IMPI client: Unexpected truncation of socket read (IMPI_Packet
> header)
> LAM IMPI client: Unexpected truncation of socket read (IMPI_Packet
> header)
> LAM IMPI client: Unexpected truncation of socket read (IMPI_Packet
> header)
> socket write: Bad file descriptor
> LAM IMPI host was not able to send an entire IMPI_Packet down a
> socket to host 0LAM IMPI client and host aborting...
> MPI_Wait: process in local group is dead (rank 1, MPI_COMM_WORLD)
> Rank (1, MPI_COMM_WORLD): Call stack within LAM:
> Rank (1, MPI_COMM_WORLD):  - MPI_Wait()
> Rank (1, MPI_COMM_WORLD):  - MPI_Recv()
> Rank (1, MPI_COMM_WORLD):  - MPI_Barrier()
> Rank (1, MPI_COMM_WORLD):  - MPI_Finalize()
> Rank (1, MPI_COMM_WORLD):  - main()
>
>
> Server full output (-v):
> ./impi_server -server 2 -auth 0 -port 5555 -v
> server_auths[0] = 0
> IMPI server version 0 started on host jicaque
> IMPI server listening on port 5555 for 2 connection(s).
> xxx.xxx.xxx.xxx:5555
> IMPI server: Entering main server loop.
> IMPI server: Accept from: 127.0.0.1
> IMPI server: A client has successfully connected
> Waiting for IMPI_CMD_AUTH from client
> Read inital auth from client.
> Waiting for IMPI_Client_auth from client
> Got auth mask from client: 0x1
> Checking server_auth[0]: 0 (IMPI_AUTH_NONE)
> Server and client have agreed on IMPI_AUTH_NONE.
> Sending IMPI_Server_auth to client, 0.
> WARNING: Client from 127.0.0.1 has connected with IMPI_AUTH_NONE
> Sending IMPI_CMD_IMPI to client 1
> Sending num_clients = 2 to client 1
> IMPI server: Accept from: 127.0.0.1
> IMPI server: A client has successfully connected
> Waiting for IMPI_CMD_AUTH from client
> Read inital auth from client.
> Waiting for IMPI_Client_auth from client
> Got auth mask from client: 0x1
> Checking server_auth[0]: 0 (IMPI_AUTH_NONE)
> Server and client have agreed on IMPI_AUTH_NONE.
> Sending IMPI_Server_auth to client, 0.
> WARNING: Client from 127.0.0.1 has connected with IMPI_AUTH_NONE
> Sending IMPI_CMD_IMPI to client 0
> Sending num_clients = 2 to client 0
> Setting poll_clients[i].fd to 5
> Setting poll_clients[i].fd to 4
> Going to read from client 1.
> Reading from client 1 (fd: 4)
> Reading COLL:0x1000 from client 1
>         Bytes to be read 8
> last_sent = -1, IMPI_INDEX_MAX = 15
> Going to read from client 0.
> Reading from client 0 (fd: 5)
> Reading COLL:0x1000 from client 0
>         Bytes to be read 8
> last_sent = -1, IMPI_INDEX_MAX = 15
> Sending COLL:0x100000 to clients
>         Bytes of coll data to be sent 16
> Going to read from client 1.
> Reading from client 1 (fd: 4)
> Reading COLL:0x1100 from client 1
>         Bytes to be read 4
> last_sent = 1, IMPI_INDEX_MAX = 15
> Going to read from client 0.
> Reading from client 0 (fd: 5)
> Reading COLL:0x1100 from client 0
>         Bytes to be read 4
> last_sent = 1, IMPI_INDEX_MAX = 15
> Sending COLL:0x110000 to clients
>         Bytes of coll data to be sent 8
> Going to read from client 1.
> Reading from client 1 (fd: 4)
> Reading COLL:0x1200 from client 1
>         Bytes to be read 4
> last_sent = 2, IMPI_INDEX_MAX = 15
> Going to read from client 0.
> Reading from client 0 (fd: 5)
> Reading COLL:0x1200 from client 0
>         Bytes to be read 4
> last_sent = 2, IMPI_INDEX_MAX = 15
> Sending COLL:0x120000 to clients
>         Bytes of coll data to be sent 8
> Going to read from client 1.
> Reading from client 1 (fd: 4)
> Reading COLL:0x1300 from client 1
>         Bytes to be read 4
> last_sent = 3, IMPI_INDEX_MAX = 15
> Going to read from client 0.
> Reading from client 0 (fd: 5)
> Reading COLL:0x1300 from client 0
>         Bytes to be read 4
> last_sent = 3, IMPI_INDEX_MAX = 15
> Sending COLL:0x130000 to clients
>         Bytes of coll data to be sent 8
> Going to read from client 0.
> Reading from client 0 (fd: 5)
> Reading COLL:0x1400 from client 0
>         Bytes to be read 4
> Going to read from client 1.
> Reading from client 1 (fd: 4)
> Reading COLL:0x1400 from client 1
>         Bytes to be read 4
> last_sent = 4, IMPI_INDEX_MAX = 15
> Sending COLL:0x140000 to clients
>         Bytes of coll data to be sent 8
> Going to read from client 0.
> Reading from client 0 (fd: 5)
> Reading COLL:0x1500 from client 0
>         Bytes to be read 4
> Going to read from client 1.
> Reading from client 1 (fd: 4)
> Reading COLL:0x1500 from client 1
>         Bytes to be read 4
> last_sent = 5, IMPI_INDEX_MAX = 15
> Sending COLL:0x150000 to clients
>         Bytes of coll data to be sent 8
> Going to read from client 0.
> Reading from client 0 (fd: 5)
> Reading COLL:0x1600 from client 0
>         Bytes to be read 4
> Going to read from client 1.
> Reading from client 1 (fd: 4)
> Reading COLL:0x1600 from client 1
>         Bytes to be read 4
> last_sent = 6, IMPI_INDEX_MAX = 15
> Sending COLL:0x160000 to clients
>         Bytes of coll data to be sent 8
> Going to read from client 0.
> Reading from client 0 (fd: 5)
> Reading COLL:0x2000 from client 0
>         Bytes to be read 16
> Going to read from client 1.
> Reading from client 1 (fd: 4)
> Reading COLL:0x2000 from client 1
>         Bytes to be read 16
> last_sent = 7, IMPI_INDEX_MAX = 15
> Sending COLL:0x200000 to clients
>         Bytes of coll data to be sent 32
> Going to read from client 0.
> Reading from client 0 (fd: 5)
> Reading COLL:0x2100 from client 0
>         Bytes to be read 4
> Going to read from client 1.
> Reading from client 1 (fd: 4)
> Reading COLL:0x2100 from client 1
>         Bytes to be read 4
> last_sent = 8, IMPI_INDEX_MAX = 15
> Sending COLL:0x210000 to clients
>         Bytes of coll data to be sent 8
> Going to read from client 0.
> Reading from client 0 (fd: 5)
> Reading COLL:0x2200 from client 0
>         Bytes to be read 4
> Going to read from client 1.
> Reading from client 1 (fd: 4)
> Reading COLL:0x2200 from client 1
>         Bytes to be read 4
> last_sent = 9, IMPI_INDEX_MAX = 15
> Sending COLL:0x220000 to clients
>         Bytes of coll data to be sent 8
> Going to read from client 0.
> Reading from client 0 (fd: 5)
> Reading COLL:0x2300 from client 0
>         Bytes to be read 4
> Going to read from client 1.
> Reading from client 1 (fd: 4)
> Reading COLL:0x2300 from client 1
>         Bytes to be read 4
> last_sent = 10, IMPI_INDEX_MAX = 15
> Sending COLL:0x230000 to clients
>         Bytes of coll data to be sent 8
> Going to read from client 0.
> Reading from client 0 (fd: 5)
> Reading COLL:0x2400 from client 0
>         Bytes to be read 4
> Going to read from client 1.
> Reading from client 1 (fd: 4)
> Reading COLL:0x2400 from client 1
>         Bytes to be read 4
> last_sent = 11, IMPI_INDEX_MAX = 15
> Sending COLL:0x240000 to clients
>         Bytes of coll data to be sent 8
> Going to read from client 0.
> Reading from client 0 (fd: 5)
> Reading COLL:0x3000 from client 0
>         Bytes to be read 16
> Going to read from client 1.
> Reading from client 1 (fd: 4)
> Reading COLL:0x3000 from client 1
>         Bytes to be read 16
> last_sent = 12, IMPI_INDEX_MAX = 15
> Sending COLL:0x300000 to clients
>         Bytes of coll data to be sent 32
> Going to read from client 0.
> Reading from client 0 (fd: 5)
> Reading COLL:0x3100 from client 0
>         Bytes to be read 8
> Going to read from client 1.
> Reading from client 1 (fd: 4)
> Reading COLL:0x3100 from client 1
>         Bytes to be read 8
> last_sent = 13, IMPI_INDEX_MAX = 15
> Sending COLL:0x310000 to clients
>         Bytes of coll data to be sent 16
> Going to read from client 0.
> Reading from client 0 (fd: 5)
> Reading IMPI_CMD_DONE from client 0
> Going to read from client 1.
> Reading from client 1 (fd: 4)
> Reading IMPI_CMD_DONE from client 1
> last_sent = 14, IMPI_INDEX_MAX = 15
> Sending IMPI_CMD_DONE to clients
> Setting poll_clients[i].fd to 5
> Setting poll_clients[i].fd to 4
> Going to read from client 0.
> Reading from client 0 (fd: 5)
> Read from client -- client socket closed!
> Client 0 has hung up before sending IMPI_CMD_FINI.  How anti-social.
> Going to read from client 1.
> Reading from client 1 (fd: 4)
> Read from client -- client socket closed!
> Client 1 has hung up before sending IMPI_CMD_FINI.  How anti-social.
> IMPI server: Exiting main server loop.
> IMPI server: Shutting down client fd: 5
> IMPI server: Shutting down client fd: 4
>
>
>
>
>
>
>
> Jeff Squyres wrote:
> Yikes -- egg on our face.  :-(
>
> These are not your fault -- they are some bit-rot issues with the
> IMPI code that we neglected to fix.  I have committed fixes to
> Subversion; you can either build directly from an SVN checkout or wait
> for the nightly snapshot tonight.
>
> Let us know how that works for you.
>
>
> On Dec 20, 2004, at 10:40 AM, Luiz Angelo Barchet-Steffenel wrote:
>
>
> Hi!
>
> I'm having some troubles compiling LAM 7.1.1 with support to IMPI. In
> fact, I tried to compile in both gcc 3.2 (on a IA64 - Redhat AS
> server) and gcc 3.3.5 (on a Pentium IV with Debian), but the error is
> always the same (as shown below).
>
> By the hand, I also tried IMPI with LAM 7.0.4, but the IMPI server
> reported
> "LAM IMPI client: Unexpected truncation of socket read (IMPI_Packet
> header)". Does anyone have an idea on this problem (the only mail
> related to this error on LAM list does not help too much).
>
> Thank you!
>
> Luiz Angelo Steffenel
>
>
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/