Brian,
I got it working with help from Micha Arndt on the mail list. There were 2
problems. I was unaware the the LSB_JOBINDEX environment variable was used
by LAM - this was being unset by my environment stripper (we have lots of
trouble with end-user jobs because of stray environment variables so I strip
out all but the stuff we really need). Second, there was a problem with the
rsh-replacement scripts I was using. For obscure reasons it worked without
the suffix, but not with it - I got it fixed.
I replied to thank Micha, but I didn't post to the whole list - I should
have done that so you would know the resolution.
I love free software - I got a response off the mail list in less than an
hour and had a working fix for my problem in under a day :-)
Thanks for following up.
Dave
-----Original Message-----
From: Brian Barrett [mailto:brbarret_at_[hidden]]
Sent: Thursday, November 06, 2003 9:10 PM
To: General LAM/MPI mailing list
Subject: Re: LAM: Problem with LAM_MPI_SESSION_SUFFIX
On Nov 3, 2003, at 7:50 AM, Topp, Dave (GEAE) wrote:
> I am running LAM-7.0.2 on Red Hat 7.3 (2.4.20-19.7smp) and am having
> problems getting LAM to boot when I specify the LAM_MPI_SESSION_SUFFIX
> environment variable. I am running jobs under LSF Batch (but not
> using the
> LSF Parallel product). The documentation says if I specify this
> variable, I
> will be able to support multiple mpirun sessions from the same user on
> the
> same host - something I need in our batch environment. I am using my
> own
> rsh command (using LSF functionality to start remote processes rather
> than
> rsh). LAM will boot when I don't specify the session suffix. When I
> try
> with suffix specified, I get something like this:
I don't see anything obviously wrong. Can you send me the output of
running "lamboot -v -d"? The extra output might expose something
obvious. From the limited information I have, it almost looks like
lamboot is finding an older hboot that doesn't understand the
-sessionsuffix flag. You might want to try something like
/afs/ae.ge.com/apps/lam/LINUX24/lam-7.0.2_prod_ge/ge/ge_rsh.ksh
where hboot
and make sure that you are using the hboot you expect to be using.
Brian
--
Brian Barrett
LAM/MPI developer and all around nice guy
Have a LAM/MPI day: http://www.lam-mpi.org/
_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/
|