LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2004-09-03 16:55:12


I have very little (read: no) Java experience.

This message typically occurs when LAM is unable to find the session
directory, so it concludes that a lamd is not running.

I'll ask a silly question first -- does java clean out the environment
in some way, or simply not pass the environment through to the running
process? That would explain what's happening -- in a non-batch
environment, no SESSION_SUFFIX (or any other) environment variable is
necessary; LAM will use its default session directory name, and all the
LAM executables (including mpirun and the MPI processes) will find it
with no help. The only situation where you need SESSION_SUFFIX (etc.)
is when you need a non-default session directory (like a batch system).
  So if the lamboot put the session directory in one place because of
SESSION_SUFFIX (or some other environment variable), if the MPI
processes don't see that environment variable, then they won't know
where to look for the session directory and give the error message that
you are seeing.

On Sep 3, 2004, at 5:41 PM, Bernard Li wrote:

> Hi list:
>
> A little background - we have been running SGE 5.3p6 with LAM 7.0.4 for
> a while now with the tight integration script by Chris Duncan and it
> has
> been problemless so far until we tried to run some mpiJava applications
> on our cluster.
>
> I tried manually running the application in a lambooted environment and
> it works fine, but as soon as I try to submit the application via SGE,
> it doesn't work.
>
> Specifically, it says that lamd isn't running, when I am perfectly sure
> that the environment has been set up by the tight integration scripts!
>
> In the script I submit to SGE, I even tried to execute both mpirun and
> lamexec, and lamexec would work (meaning that there is a lambooted
> environment) but mpirun with the mpiJava application just doesn't work
> (it complains that lamd isn't running).
>
> I am not sure whether this has to do with the SESSION_SUFFIX bug that
> has been fixed with 7.0.6 so I tried that but it didn't seem to help...
>
> The syntax to run mpiJava application is:
>
> mpirun -np 4 java HelloJava
>
> (where HelloJava is the mpiJava application)
>
> Can anybody think of any reason why it isn't working? I mean it works
> perfectly with a manually lambooted environment, but it would be much
> cleaner if things can be farmed off via GridEngine.
>
> Thanks,
>
> Bernard
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/