LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-05-23 17:59:08


Try some simple tests:

- Does "tping -c 3" run successfully? (It should ping all the lamd's)
- Does "lamexec N hostname" run successfully? (It should run
"hostname" on all the booted nodes)
- When you "mpirun -np 15 ring.out", do you see ring.out executing on
all the nodes? (i.e., if you ssh into each of the nodes and run ps,
do you see it running?)

On May 23, 2007, at 3:50 PM, K. Charoenpornwattana Ter wrote:

>
> ---------- Forwarded message ----------
> From: "Jeff Squyres \(jsquyres\)" < jsquyres_at_[hidden]>
> To: "General LAM/MPI mailing list" <lam_at_[hidden]>, <lam_at_lam-
> mpi.org >
> Date: Tue, 22 May 2007 23:56:36 -0400
> Subject: Re: LAM: lamboot is ok, mpirun is not Hi
> What happens when you try to mpirun an MPI application that was
> compiled with LAM's mpicc?
>
>
> It's compiled sucessfully with LAM's mpicc, but still have the
> problem.
> Here is what I did:
> ----------------------------------
> [ter_at_uftoscar test]$ echo $PATH
> /opt/lam-7.1.3/bin/:/opt/mpich-ch_p4-gcc-1.2.7/bin/:/usr/kerberos/
> sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/
> usr/sbin:/usr/bin:/usr/X11R6/bin:/opt/pbs/bin:/opt/pbs/lib/xpbs/
> bin:/opt/kernel_picker/bin:/opt/env-switcher/bin:/opt/pvm3/lib:/opt/
> pvm3/lib/LINUX:/opt/pvm3/bin/LINUX:/usr/local/apitest:/opt/c3-4/:/
> root/bin
> [ter_at_uftoscar test]$ mpicc ring.c -o ring.out
> [ter_at_uftoscar test]$ lamboot -v host
>
> LAM 7.1.3/MPI 2 C++/ROMIO - Indiana University
>
> n-1<20590> ssi:boot:base:linear: booting n0 (uftoscar)
> n-1<20590> ssi:boot:base:linear: booting n1 (oscarnode1)
> n-1<20590> ssi:boot:base:linear: booting n2 (oscarnode2)
> n-1<20590> ssi:boot:base:linear: booting n3 (oscarnode3)
> n-1<20590> ssi:boot:base:linear: booting n4 (oscarnode4)
> n-1<20590> ssi:boot:base:linear: booting n5 (oscarnode5)
> n-1<20590> ssi:boot:base:linear: booting n6 (oscarnode6)
> n-1<20590> ssi:boot:base:linear: booting n7 (oscarnode7)
> n-1<20590> ssi:boot:base:linear: booting n8 (oscarnode8)
> n-1<20590> ssi:boot:base:linear: booting n9 (oscarnode9)
> n-1<20590> ssi:boot:base:linear: booting n10 (oscarnode10)
> n-1<20590> ssi:boot:base:linear: booting n11 (oscarnode11)
> n-1<20590> ssi:boot:base:linear: booting n12 (oscarnode12)
> n-1<20590> ssi:boot:base:linear: booting n13 (oscarnode13)
> n-1<20590> ssi:boot:base:linear: booting n14 (oscarnode14)
> n-1<20590> ssi:boot:base:linear: finished
> [ter_at_uftoscar test]$ lamnodes
> n0 uftoscar.latech:1:origin,this_node
> n1 oscarnode1.latech :1:
> n2 oscarnode2.latech:1:
> n3 oscarnode3.latech:1:
> n4 oscarnode4.latech:1:
> n5 oscarnode5.latech:1:
> n6 oscarnode6.latech:1:
> n7 oscarnode7.latech:1:
> n8 oscarnode8.latech :1:
> n9 oscarnode9.latech:1:
> n10 oscarnode10.latech:1:
> n11 oscarnode11.latech:1:
> n12 oscarnode12.latech:1:
> n13 oscarnode13.latech:1:
> n14 oscarnode14.latech:1:
> [ter_at_uftoscar test]$ mpirun -np 15 -v ring.out
> 20626 ring.out running on n0 (o)
> <freeze>
>
> No firewall is running on any nodes in this cluster, and $PATH on
> every nodes start with "/opt/lam-7.1.3/bin/"
>
> Thanks
> Ter
>
> -----Original Message-----
> From: K. Charoenpornwattana Ter [mailto:kcharoen_at_[hidden]]
> Sent: Tuesday, May 22, 2007 09:11 PM Eastern Standard Time
> To: lam_at_[hidden]
> Subject: LAM: lamboot is ok, mpirun is not
>
> Hi all,
>
> I have some problems with lam/mpi. I have been searching around the
> net but
> noone has same problem as me.
>
> My cluster has 1 head node and 14 compute nodes. I installed centos
> 4.5-i386.
> I used OSCAR 4.2.1 to help building this cluster. I completely
> uninstalled
> lam/mpi that came with OSCAR 4.2 and installed lam/mpi 7.1.3 with
> blcr 0.5.1
> .
>
>
> The problem is I can successfully lamboot hosts, but can't execute mpi
> application (even simple hello world) on multiple nodes. (I can
> lamboot on
> single node and execute "mpirun -np 1 hello.out")
>
> I can ping, tping, traceroute from head to every nodes and vice
> versa in the
> cluster. I can execute any mpi applications on this cluster using
> MPICH.
>
> [ter_at_uftoscar ~]$ which mpirun
> /opt/lam-7.1.3/bin/mpirun
> [ter_at_uftoscar ~]$ ssh oscarnode1 which mpirun
> /opt/lam-7.1.3/bin/mpirun
>
> [ter_at_uftoscar ~]$ echo $PATH
> /opt/lam-7.1.3
> /bin/:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/
> local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/X11R6/bin:/opt/pbs/
> bin:/opt/pbs/lib/xpbs/bin:/opt/kernel_picker/bin:/opt/env-switcher/
> bin:/opt/pvm3/lib:/opt/pvm3/lib/LINUX:/opt/pvm3/bin/LINUX:/usr/
> local/apitest:/opt/c3-4/:/root/bin:/opt/lam-
> 7.1.3/bin/
> [ter_at_uftoscar ~]$ ssh oscarnode1 echo $PATH
> /opt/lam-7.1.3
> /bin/:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/
> local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/X11R6/bin:/opt/pbs/
> bin:/opt/pbs/lib/xpbs/bin:/opt/kernel_picker/bin:/opt/env-switcher/
> bin:/opt/pvm3/lib:/opt/pvm3/lib/LINUX:/opt/pvm3/bin/LINUX:/usr/
> local/apitest:/opt/c3-4/:/root/bin:/opt/lam-
> 7.1.3/bin/
>
>
> I am sure that the older version of lam/mpi was completely removed.
> and I
> set env switcher to none.
>
> Any help would be greatly apprecated.
>
> Thanks
> Ter
>
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/

-- 
Jeff Squyres
Cisco Systems