LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2011-06-27 10:17:10


How about upgrading to Open MPI? LAM/MPI is ancient; we do not recommend it for anyone starting in MPI.

Also, someone just posted a guide to getting OMPI running nicely on EC2:

    http://www.open-mpi.org/community/lists/users/2011/06/16717.php

On Jun 27, 2011, at 9:29 AM, Christian Baun wrote:

> Hi all,
>
> I try to install a LAM/MPI cluster with 3 Nodes inside AWS EC2.
> Three instances with Ubuntu 10.10 are runnung.
>
> The /etc/hosts at the master-node includes these lines:
> 10.86.209.175 ip-10-86-209-175.ec2.internal master
> 10.122.171.209 ip-10-122-171-209.ec2.internal node1
> 10.252.86.100 domU-12-31-38-00-51-96.compute-1.internal node2
>
> I also created a file /home/ubuntu/hosts.mpi with these lines:
> master
> node1
> node2
>
> This part looks good:
> $ lamboot -v hosts.mpi
>
> LAM 7.1.2/MPI 2 C++/ROMIO - Indiana University
>
> n-1<6731> ssi:boot:base:linear: booting n0 (master)
> n-1<6731> ssi:boot:base:linear: booting n1 (node1)
> n-1<6731> ssi:boot:base:linear: booting n2 (node2)
> n-1<6731> ssi:boot:base:linear: finished
>
> $ lamnodes
> n0 ip-10-86-209-175.ec2.internal:1:origin,this_node
> n1 ip-10-122-171-209.ec2.internal:1:
> n2 domU-12-31-38-00-51-96.compute-1.internal:1:
>
> $ lamnodes -ic
> n0 10.86.209.175
> n1 10.122.171.209
> n2 10.252.86.100
>
> At node1: $ ps -x | grep lamd
> 5870 ? Ss 0:00 /usr/bin/lamd -H 10.86.209.175 -P 53768 -n 1 -o 0
>
> At node2: $ ps -x | grep lamd
> 5287 ? Ss 0:00 /usr/bin/lamd -H 10.86.209.175 -P 53768 -n 2 -o 0
>
> But when I try to run the popular "Hello world from process..." example like this one:
> http://www.dartmouth.edu/~rc/classes/intro_mpi/hello_world_ex.html
>
> $ mpicc -g -o hello hello.c
> $ mpirun -np 3 hello
> Hello world from process 0 of 1
> Hello world from process 0 of 1
> Hello world from process 0 of 1
>
> When I try to force the distribution, I get an error message
>
> $ mpirun -np 3 -nolocal hello
> --------------------------------------------------------------------------
> There are no available nodes allocated to this job. This could be because
> no nodes were found or all the available nodes were already used.
>
> Note that since the -nolocal option was given no processes can be
> launched on the local node.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> A daemon (pid unknown) died unexpectedly on signal 1 while attempting to
> launch so we are aborting.
>
> There may be more information reported by the environment (see above).
>
> This may be because the daemon was unable to find all the needed shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --------------------------------------------------------------------------
> mpirun: clean termination accomplished
>
>
> What did I wrong?
>
> Thenks for any help.
>
> Best Regards
> Christian
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/