LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Neil Storer (Neil.Storer_at_[hidden])
Date: 2004-08-04 06:29:45


Gkikas,

Have tried running a TCP/IP test (e.g. TTCP, ping, traceroute) between the
various nodes, to make sure that you don't have an intermittent network
problem. This should indicate whether or not the problem is really a LAM
issue or a hardware on.

Regards
        Neil Storer

Gkikas Magiorkinis wrote:
> Hello again!
>
>
>
> I have posted before a problem on tping!
>
>
>
> I have built a mini cluster composed of the following:
>
>
>
> 3* PC Intel P4 2.88 Mhz, 256 MB RAM, 20 GB HD, 1Gbit LAN
>
> 1* PC Intel P4 2.88 Mhz, 512 MB RAM, 120 GB HD, 1Gbit LAN
>
> Red Hat Linux 9 (Shrike)
>
> Private network (192.168.1.1-192.168.1.4)
>
>
>
> As i mentioned before i had a problem with LAM: though the lamboot
> started perfectly well
>
> on all nodes and the programs compiled perfectly well when i tried to
> execute the programs
>
> they freezed at the very beginning. The network switch was evidencing of
> network traffic as
>
> soon as i tried to start the programs and continued even when i closed
> the terminal.
>
> Searching for an answer i tried to tping the nodes as following:
>
> Tping from n0 to n0 was perfectly well
>
> Tping from n0 to n1 was freezing at the third ping
>
> Tping from n1 to n1 was perfectly well
>
> Tping from n1 to n0 was freezing at the third ping
>
> As soon as the tping was frozen i closed the terminal. I noticed that
> the switch started to
>
> blink only when i started tping and continued to blink till i logged in
> again and lamhalted or lambooted
>
> all the nodes.
>
>
>
> The strange thing is that when i tpinged n1 from n0 (or n0 from n1) once
> it worked perfectly well, but the
>
> switch was blinking all the time even though the tping did not freeze.
> Then i tpinged once more,
>
> the statistics were perfectly well and the switch continued to blink
> (during all this time it did not stop
>
> to blink). Then i tpinged for a third time and it froze. Looks like it
> has a bug with memory???
>
>
>
> Nevertheless the normal ping command works perfectly well, there is no
> firewall on my machine,
>
> i have also disabled the iptables. I have formatted and re-set the
> cluster. I have built the LAM
>
> using the source code: i compiled the 7.0.6 version and still had the
> same problem,
>
> i compiled the 6.5.9 version and still had the same problem. I am using
> an nfs folder in order
>
> not to install the packages in every node separately.
>
>
>
> I really do not know what to do.
>
>
>
> Is it a hardware problem i should check??? I am using a 3COM 1Gbit
> 16port switch and 3COM 1Gbit
>
> LAN adapters.
>
> Are there any daemons that might conflict with lamd???
>
> Is there any network check procedure i could perform???
>
>
>
> I have compiled also the MPICH1.2.5.2 which works fine but some programs
> are claimed to run better with LAM.
>
>
>
> Please help. Any suggestion would be helpful!
>
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/

-- 
+-----------------+---------------------------------+------------------+
| Neil Storer     |    Head: Systems S/W Section    | Operations Dept. |
+-----------------+---------------------------------+------------------+
| ECMWF,          | email: neil.storer_at_[hidden]    |    //=\\  //=\\  |
| Shinfield Park, | Tel:   (+44 118) 9499353        |   //   \\//   \\ |
| Reading,        |        (+44 118) 9499000 x 2353 | ECMWF            |
| Berkshire,      | Fax:   (+44 118) 9869450        | ECMWF            |
| RG2 9AX,        |                                 |   \\   //\\   // |
| UK              | URL:   http://www.ecmwf.int/    |    \\=//  \\=//  |
+--+--------------+---------------------------------+----------------+-+
    | ECMWF is the European Centre for Medium-Range Weather Forecasts |
    +-----------------------------------------------------------------+