LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Bernard Li (bli_at_[hidden])
Date: 2004-09-03 17:10:01


Hey Jeff:

What would be the best way to prevent the SESSION_SUFFIX from being used
then? Would it simply be modifying the lam-conf.lamd and commenting out
$session_suffix?

I suspect the problem I am encountering has to do with the application
not being able to attach to the same session (what you described).
Also, is it possible to tell mpirun which session to attach to? I guess
I could use the LAM_MPI_SESSION_SUFFIX environment variable?

Thanks,

Bernard

> -----Original Message-----
> From: lam-bounces_at_[hidden]
> [mailto:lam-bounces_at_[hidden]] On Behalf Of Jeff Squyres
> Sent: Friday, September 03, 2004 14:55
> To: General LAM/MPI mailing list
> Subject: Re: LAM: mpiJava on SGE 5.3p6 with LAM 7.0.x
>
> I have very little (read: no) Java experience.
>
> This message typically occurs when LAM is unable to find the
> session directory, so it concludes that a lamd is not running.
>
> I'll ask a silly question first -- does java clean out the
> environment in some way, or simply not pass the environment
> through to the running process? That would explain what's
> happening -- in a non-batch environment, no SESSION_SUFFIX
> (or any other) environment variable is necessary; LAM will
> use its default session directory name, and all the LAM
> executables (including mpirun and the MPI processes) will
> find it with no help. The only situation where you need
> SESSION_SUFFIX (etc.) is when you need a non-default session
> directory (like a batch system).
> So if the lamboot put the session directory in one place
> because of SESSION_SUFFIX (or some other environment
> variable), if the MPI processes don't see that environment
> variable, then they won't know where to look for the session
> directory and give the error message that you are seeing.
>
>
> On Sep 3, 2004, at 5:41 PM, Bernard Li wrote:
>
> > Hi list:
> >
> > A little background - we have been running SGE 5.3p6 with LAM 7.0.4
> > for a while now with the tight integration script by Chris
> Duncan and
> > it has been problemless so far until we tried to run some mpiJava
> > applications on our cluster.
> >
> > I tried manually running the application in a lambooted environment
> > and it works fine, but as soon as I try to submit the
> application via
> > SGE, it doesn't work.
> >
> > Specifically, it says that lamd isn't running, when I am perfectly
> > sure that the environment has been set up by the tight
> integration scripts!
> >
> > In the script I submit to SGE, I even tried to execute both
> mpirun and
> > lamexec, and lamexec would work (meaning that there is a lambooted
> > environment) but mpirun with the mpiJava application just
> doesn't work
> > (it complains that lamd isn't running).
> >
> > I am not sure whether this has to do with the
> SESSION_SUFFIX bug that
> > has been fixed with 7.0.6 so I tried that but it didn't
> seem to help...
> >
> > The syntax to run mpiJava application is:
> >
> > mpirun -np 4 java HelloJava
> >
> > (where HelloJava is the mpiJava application)
> >
> > Can anybody think of any reason why it isn't working? I
> mean it works
> > perfectly with a manually lambooted environment, but it
> would be much
> > cleaner if things can be farmed off via GridEngine.
> >
> > Thanks,
> >
> > Bernard
> >
> > _______________________________________________
> > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> >
>
> --
> {+} Jeff Squyres
> {+} jsquyres_at_[hidden]
> {+} http://www.lam-mpi.org/
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
>