LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Christopher Porter (Cporter_at_[hidden])
Date: 2006-04-24 16:02:38


There is a method under the 7.x series in an LSF environment to use a remote the execution utility called "lsgrun". Below is an excerpt from a document I've been writing.

 

ENVIRONMENT

 

Several enterprise environments which use Platform software to manage their workload also put constraints on users to goad them into using the LSF system. One of these constraints is denying direct login to execution hosts. In general this does not impede LSF from

allowing users to submit and hand off execution launch of their jobs to LSF, but in the case where LAM MPI is required, such an environment is problematic because of the required "lamboot" process. This process assumes rsh/ssh login access to get the MPI daemons started. Such an assumption is in coflict with the no-login environment, and can force problematic exceptions to environment management rules.

 

In some cases, third party applications are delivered with LAM-MPI libraries pre-compiled so solutions to this problem should not require code modification or recompilation of the LAM libraries if at all possible.

 

GOAL

 

Since LSF has it's own internally used authentication mechanism, instead of using rsh/ssh it is possible to use the base command "lsgrun" to make the remote connection to execution hosts and launch the daemons. Once the daemons are launched, the job can be launched in a similar fashion to any serial job.

 

METHOD

 

LAM-MPI has always used a default remote connection method to start the communication daemons it uses during a parallel application run - "rsh". As security became more important in compute environments, the developers of LAM introduced an environment variable LAMRSH which

would be used if defined to redirect the remote connection application. Most folks use "ssh". In this case we set it thusly:

 

LAMRSH="lsgrun -m"

 

This redirection though is not enough. There are some assumptions made in the lamboot code which constructs the system call command lines and assumes whatever redirect is used in the LAMRSH variable, the same switches (such as "-n") perform the same function. This isn't true for "lsgrun" so further modification is needed.

 

Thankfully recent versions of LAM documentation are complete enough to provide enough of a peek under the hood that one can control the daemon boot process at a more fundamental level.

 

http://www.lam-mpi.org/using/docs/ the user guide, specifically Chapter 8 describes LAM Modules. LAM uses a plugin architecture called "boot modules" to provide ability for integrations with various environments including Scyld Beowulf, Globus, PBS/Torque, and the generic rsh/ssh environment. This generic environment is the one we intend to operate upon since no explicit LSF integration exists.

 

Table 8.3 is specifically interesting because it details the additonal parameters that can be passed to the boot module used by the lamboot process. The three that are most interesting are boot_rsh_fast and boot_rsh_ignore_stderr boot_rsh_no_n.

   boot_rsh_no_n - removes the "-n" switch normally applied to rsh/ssh system calls which

                   under those protocols redirects input from the special device /dev/null

   boot_rsh_fast - skips the step which determines the shell on remote hosts and assumes

                   the local shell is used.

   boot_rsh_ignore_stderr - the lamboot process will terminate before completing if any data

                   shows up in STDERR by default. If "lsgrun" substitution produces some

                   output to this device, this switch could be very handy, especially if the

                   data is inconsequential to the overall goal.

   boot_rsh_no_profile - prevents the daemon launch process from trying to run the user's

                   .profile

 

One can sucessfully boot LAM using "lsgrun" in an environment where users can not login via rsh or ssh (/etc/passwd contains shell definitions of /bin/false for instance) with the following setup:

 

for the Bourne again Shell (bash):

  1) export LAMRSH="lsgrun -m"

  2) lamboot -v -ssi boot_rsh_no_n 1 -ssi boot_rsh_fast 1 -ssi boot_rsh_no_profile 1 <schema file>

 

for Csh or tcsh:

  1) setenv LAMRSH "lsgrun -m"

  2) lamboot -v -ssi boot_rsh_no_n 1 -ssi boot_rsh_fast 1 -ssi boot_rsh_no_profile 1 <schema file>

 

 

-----Original Message-----
From: lam-bounces_at_[hidden] [mailto:lam-bounces_at_[hidden]] On Behalf Of Phil Ehrens
Sent: Monday, April 24, 2006 12:44 PM
To: General LAM/MPI mailing list
Subject: Re: LAM: lamboot without rsh/ssh

 

YoungHui Amend wrote:

> I'm using an old versions (6.3) of LAM/MPI. In this version of LAM,

> lamboot uses rsh to run hboot which forks lam demon (lamd). The problem

> we are running in to is that rsh is not allowed (for security reasons)

> on the cluster of machines connected to LSF. Ssh also causes problems

> because it prompts you for the password. I know there's a way to setup

> ssh so it doesn't prompt for a password, but it is not a viable option.

> So, is there a way to fork lam demon without going through lamboot?

 

You have an installation where rsh can't be used due to security

concerns... and ssh can't be used because it's "not viable"?

 

Do they make you communicate with you coworkers by blinking flashlights

at them?

_______________________________________________

This list is archived at http://www.lam-mpi.org/MailArchives/lam/