LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Josh Hursey (jjhursey_at_[hidden])
Date: 2005-09-06 14:59:38


Jake,

Thanks for the detailed message. The problem you describe was a bug in
the 7.1.1 rsh boot module, and has been fixed in the latest beta. You
can acquire the beta (7.1.2 beta 25) from here:
http://www.lam-mpi.org/beta/

That should fix this problem for you. Let us know if it doesn't.

Cheers,
Josh

On Sep 6, 2005, at 12:31 PM, <dick_at_[hidden]> wrote:

> greetz,
>
> i've been playing around with the LAM-MPI 7.1.1 source and
> have tried to get it to run on openbsd 3.5 without success. by
> "get it to run" i mean that basic tests to check that it works
> correctly fail (i'll expand on this below). i find this odd
> since it is claimed that openbsd 3.5 is a tested platform for
> LAM-MPI 7.1.1 (see
> http://lam-mpi.lzu.edu.cn/about/overview/support.php ).
>
> since there is a port for LAM-MPI 6.5.9 (the old unsupported
> version) on openbsd 3.6 and later, i tested it on a 3.6
> install. i made sure things were working by doing a "$ recon
> -v bhost.def" and a "$ lamboot -v bhost.def" without getting
> any errors, where bhost.def contains the two node hostnames in
> question. i also successfully compiled and ran most of the
> example programs in the examples directory of the 6.5.9 source
> tree on the two test nodes.
>
> i did get 7.1.1 to compile and install correctly, but basic
> commands don't work and i get errors when i try to do anything
> remotely. i did change the RSH agent to be ssh, so the only
> thing non-default i did was to '$ ./configure --with-rsh="ssh
> -x"'. here are the problems:
>
> 1) recon and lamboot give me grief about remote computer
> (NOTE: i have ssh working with public key authentication just
> fine)
>
> $ recon -v bhost.def
>
>
> n-1<6817> ssi:boot:base:linear: booting n0 (craptiva.plf)
> ERROR: LAM/MPI unexpectedly received the following on stderr:
> ksh: [: missing ]
> -----------------------------------------------------------------------
> ------
> LAM failed to execute a LAM binary on the remote node
> "craptiva.plf".
> Since LAM was already able to determine your remote shell as
> "tkill",
> it is probable that this is not an authentication problem.
>
> *** PLEASE READ THIS ENTIRE MESSAGE, FOLLOW ITS SUGGESTIONS, AND
> *** CONSULT THE "BOOTING LAM" SECTION OF THE LAM/MPI FAQ
> *** (http://www.lam-mpi.org/faq/) BEFORE POSTING TO THE
> LAM/MPI USER'S
> *** MAILING LIST.
>
> LAM tried to use the remote agent command "ssh"
> to invoke the following command:
>
> ssh -x craptiva.plf -n '( ! [ -e ./.profile] || .
> ./.profile;' tkill -N -v )
>
> This can indicate several things. You should check the following:
>
> - The LAM binaries are in your $PATH
> - You can run the LAM binaries
> - The $PATH variable is set properly before your
> .cshrc/.profile exits
>
> Try to invoke the command listed above manually at a Unix prompt.
>
> You will need to configure your local setup such that you will
> *not*
> be prompted for a password to invoke this command on the
> remote node.
> No output should be printed from the remote node before the
> output of
> the command is displayed.
>
> When you can get this command to execute successfully by hand, LAM
> will probably be able to function properly.
> -----------------------------------------------------------------------
> ------
> n-1<6817> ssi:boot:base:linear: Failed to boot n0 (craptiva.plf)
> n-1<6817> ssi:boot:base:linear: aborted!
>
> i suspect that this has to do with some mucked shell syntax,
> but i'm not sure
>
> 2) laminfo just hangs, irrespective of the arguments i pass it
>
> 3) mpirun hangs when i try to test the examples on a single node
>
> $ lamboot -v
>
> LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University
>
> n-1<10867> ssi:boot:base:linear: booting n0 (localhost)
> n-1<10867> ssi:boot:base:linear: finished
> $ mpirun C ring
> ^C---------------------------------------------------------------------
> --------
> It seems that [at least] one of the processes that was started
> with
> mpirun did not invoke MPI_INIT before quitting (it is possible
> that
> more than one process did not invoke MPI_INIT -- mpirun was only
> notified of the first one, which was on node n-809558100).
>
> mpirun can *only* be used with MPI programs (i.e., programs that
> invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec"
> program
> to run non-MPI programs over the lambooted nodes.
> -----------------------------------------------------------------------
> ------
>
> so, given these issues with 7.1.1, i wonder if i should try to
> work through the errors, provided a developer/more educated
> user is willing to help, or whether i should just work with
> the functioning 6.5.9 port. i would rather go forward than use
> a dated version of LAM-MPI which i would likely have to
> upgrade later.
>
> any suggestions welcome (aside from "use another OS"), thx for
> reading.
>
> jake
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>

----
Josh Hursey
jjhursey_at_[hidden]
http://www.lam-mpi.org/