LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Bogdan Costescu (bogdan.costescu_at_[hidden])
Date: 2004-08-09 17:00:53


On 9 Aug 2004, Satya Gosula wrote:

> I read in the manual that I need to manually set the
> LAM_MPI_SESSION_SUFFIX environment variable. There is no information
> as to what this variable should be set to or how to do it.

Thank you for reading the manuals before asking the question.

The ...SUFFIX (and ...PREFIX if you wish) variables can be set to
anything you want in form of a string. LAM will then use these to
build the session name that will be used for any communication to
the daemons in that session.

In order for two sessions to have different names, the ...SUFFIX
(and/or ...PREFIX) have to be different. FOr the 2 queueing systems
that you mentioned, LAM tries to use (mainly) the job id as the part
that differs from one session to another. Given that the job id is
supposed to be uniquely assigned to a job (within a certain time
frame), it's quite safe to assume that two session names derived this
way will be different.

A few years ago, I wrote a series of scripts that was using the
process id of the qsub-like script on the main node as the variable
part of the session name. This also worked fine, as the wraparound
time of the PIDs was still larger than the normal runtime of the jobs.

As the name says, the ...SUFFIX and ...PREFIX variables are part of
the environment. So you can set them with something like:

LAM_MPI_SESSION_SUFFIX=myqueue-$jobid
export LAM_MPI_SESSION_SUFFIX

for bash/bsh/ksh/zsh/etc. and

setenv LAM_MPI_SESSION_SUFFIX=myqueue-$jobid

for (t)csh, where $jobid is some unique number.

They have to be set before running 'lamboot' so that the daemons are
aware of the session name. Then any client program (mpirun, laminfo,
etc.) that wants to access this session has to be started in a shell
where these variables were also assigned the same value. Usually, the
sequence goes like (example for tcsh):

setenv LAMNODES="name of file that contains list of nodes allocated to
this job by the queueing system"
setenv LAM_MPI_SESSION_SUFFIX=myqueue-$jobid
lamboot ${LAMNODES}
mpirun C mpiprogram
lamhalt

This could for example be the whole batch script file that is
submitted to the queueing system.

-- 
Bogdan Costescu
IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu_at_[hidden]

_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/