LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Gkikas Magiorkinis (gmagi_at_[hidden])
Date: 2004-08-05 08:06:22


I think that you are probably right. I found out that specific udp packets
are rejected:
Using ttcp-t only 5, 11 and >16 length words are passed to the ttcp-r.
Moreover using traceroute only 40 byte and >45 byte length packets are
resolved.
Isn't this weird? Now i am almost sure that this is not a LAM problem. But i
do not know what to do. I know that this is not a network mailing list, but
if there are any ideas i would be grateful if you could help me.
My network card is 3com 3c2000 family (1 Gbit)

Gkikas

-----Original Message-----
From: lam-bounces_at_[hidden] [mailto:lam-bounces_at_[hidden]] On Behalf Of
Jeff Squyres
Sent: Thursday, August 05, 2004 1:30 PM
To: General LAM/MPI mailing list
Subject: Re: LAM: LAM freezes, please help!

Let me throw my $0.02 in here..

LAM effectively uses UDP for process control and other "out of band"
kinds of communications (e.g., tping heavily uses UDP). So if your
network is having UDP problems, LAM will probably not work properly.
The "switch lights kept blinking" issues you mentioned were probably
due to the LAM daemon trying to re-send UDP packets that were never
ack'ed. You also mentioned that tping would work a few times and then
stop -- that's not good at all. It means that *some* UDP packets are
getting through, but not all.

We won't rule out a LAM bug, but I would check your network
configuration with other programs that use UDP.

I don't have much to offer as suggestions as to *why* this would
happen, perhaps faulty hardware...?

On Aug 5, 2004, at 4:49 AM, Gkikas Magiorkinis wrote:

>
> Neil,
>
> I tried ttcp and tcp works OK but UDP is problematic. Do you have any
> idea
> why this could be happening?
>
> Gkikas
>
> -----Original Message-----
> From: lam-bounces_at_[hidden] [mailto:lam-bounces_at_[hidden]] On
> Behalf Of
> Neil Storer
> Sent: Wednesday, August 04, 2004 3:43 PM
> To: General LAM/MPI mailing list
> Subject: Re: LAM: LAM freezes, please help!
>
> Gkikas,
>
> If "ping" is working without any packets lost. e.g.
>
> ping -c 100 host-abc
>
> 100 packets transmitted, 100 received, 0% loss
>
> Then your network is probably OK. You should do the "pings" from each
> node
> to each other node to fully test the system.
>
> TTCP is a simple client-server application that uses SOCKETs to send
> data.
> It doesn't use MPI however, MPI uses SOCKETs (by default), so if TTCP
> also
> works OK, you have eliminated hardware and the underlying TCP/IP and
> routing
>
> as sources of the problem.
>
> For information on TTCP + how to get the source, use "google" or see:
>
> http://www.nps.navy.mil/cs/su/cs4550/proj1.1.htm
>
> The MAN page on my system is:
> NAME
> ttcp - test TCP and UDP performance
>
> SYNOPSIS
> ttcp -t [-lbuflen] [-s] [-nnumbufs] [-pport] [-u] [-D] [-L]
> [-Aalign]
> [-Ooffset] [-v] host [<in]
> ttcp -r [-lbuflen] [-s] [-pport] [-B] [-Aalign] [-Ooffset]
> [-u] [-v]
> [>out]
>
> DESCRIPTION
> Ttcp times the transmission and reception of data between
> two
> systems using the UDP or TCP protocols. It differs from
> common "blast" tests, which tend to measure the remote inetd
> as much
> as the network performance, and which usually do not
> allow measurements at the remote end of a UDP transmission.
>
> Ttcp.c should be compiled for both ends of the path to be
> test.
> It uses sockets and is easy to port to most machines
> based on 4.3BSD.
>
> The transmitter should be started with -t after the receiver
> has
> been
> started with -r. Test lasting at least tens of
> seconds should be used to obtain accurate measurements.
> Graphical presentations of throughput versus buffer size for
> buffers ranging from tens of bytes to several "pages" can
> illuminate
> bottlenecks.
>
> Options
> -t Transmit mode.
>
> -r Receive mode.
>
> -u Use UDP instead of TCP.
>
> -llength Length of buffers in bytes (default 8192).
>
> -nnumbufs Number of source buffers transmitted (default 2048).
>
> -pport Port number to send to or listen on (default 5001).
>
> -D If transmitting using TCP, do not buffer data when
> sending
> (sets the TCP_NODELAY socket option).
>
> -s If transmitting, do not source a data pattern to
> network;
> use stdin instead. If receiving, do not sink or
> discard,
> but print all data to stdout.
>
> -B When receiving and using the -s option, only
> output
> full
> blocks, using the block size specified by -l. This
> option is useful for programs that require complete
> blocks,
> like tar(1).
>
> -Aalign Align the start of buffers to this modulus (default
> 16384).
>
> -Ooffset Align the start of buffers to this offset (default
> 0).
> For
> example, "-A8192 -O1" causes buffers to start at
> the second byte of an 8192-byte page.
>
> -v Verbose: print more statistics.
>
> -d Set the SO_DEBUG socket option.
>
> --
> +-----------------+---------------------------------
> +------------------+
> | Neil Storer | Head: Systems S/W Section | Operations Dept.
> |
> +-----------------+---------------------------------
> +------------------+
> | ECMWF, | email: neil.storer_at_[hidden] | //=\\ //=\\
> |
> | Shinfield Park, | Tel: (+44 118) 9499353 | // \\// \\
> |
> | Reading, | (+44 118) 9499000 x 2353 | ECMWF
> |
> | Berkshire, | Fax: (+44 118) 9869450 | ECMWF
> |
> | RG2 9AX, | | \\ //\\ //
> |
> | UK | URL: http://www.ecmwf.int/ | \\=// \\=//
> |
> +--+--------------+---------------------------------+----------------
> +-+
> | ECMWF is the European Centre for Medium-Range Weather Forecasts |
> +-----------------------------------------------------------------+
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/