LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-05-24 10:04:10


That is just weird -- I don't think I've seen a case where tping
worked (implying that inter-lamd communication is working), but
running applications did not.

The only thing that I can think of is that there is some firewalling
in place that only allows arbitrary UDP traffic through...? (inter-
lamd traffic is UDP, not TCP) That doesn't seem to make sense,
though, if MPICH works (cexec uses ssh, which is most certainly
allowed). But can you triple check that there are no firewalls tcp
rules in place that restrict UDP/TCP traffic? (e.g., iptables)

Also try running tping / mpirun / lamexec from a node other than the
origin (i.e., the node you lambooted from).

On May 23, 2007, at 11:32 PM, K. Charoenpornwattana Ter wrote:

> Try some simple tests:
>
> - Does "tping -c 3" run successfully? (It should ping all the lamd's)
>
> [ter_at_uftoscar test]$ tping -c 3 n0-13
> 1 byte from 13 remote nodes and 1 local node: 0.006 secs
> 1 byte from 13 remote nodes and 1 local node: 0.005 secs
> 1 byte from 13 remote nodes and 1 local node: 0.005 secs
>
> 3 messages, 3 bytes (0.003K), 0.016 secs (0.368K/sec)
> roundtrip min/avg/max: 0.005/0.005/0.006
>
>
> - Does "lamexec N hostname" run successfully? (It should run
> "hostname" on all the booted nodes)
>
> No, it doesn't work. It only show headnode's hostname. See below:
>
> [ter_at_uftoscar ~]$ lamexec N hostname
> uftoscar.latech
> <freeze>
>
> I, however, can execute "cexec hostname" with no problem.
>
> - When you "mpirun -np 15 ring.out", do you see ring.out executing on
> all the nodes? (i.e., if you ssh into each of the nodes and run ps,
> do you see it running?
>
> I only see one ring.out run on headnode, no ring.out running on
> other nodes.
>
>
> Thanks
> Kulathep
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/

-- 
Jeff Squyres
Cisco Systems