Hi list:
A little background - we have been running SGE 5.3p6 with LAM 7.0.4 for
a while now with the tight integration script by Chris Duncan and it has
been problemless so far until we tried to run some mpiJava applications
on our cluster.
I tried manually running the application in a lambooted environment and
it works fine, but as soon as I try to submit the application via SGE,
it doesn't work.
Specifically, it says that lamd isn't running, when I am perfectly sure
that the environment has been set up by the tight integration scripts!
In the script I submit to SGE, I even tried to execute both mpirun and
lamexec, and lamexec would work (meaning that there is a lambooted
environment) but mpirun with the mpiJava application just doesn't work
(it complains that lamd isn't running).
I am not sure whether this has to do with the SESSION_SUFFIX bug that
has been fixed with 7.0.6 so I tried that but it didn't seem to help...
The syntax to run mpiJava application is:
mpirun -np 4 java HelloJava
(where HelloJava is the mpiJava application)
Can anybody think of any reason why it isn't working? I mean it works
perfectly with a manually lambooted environment, but it would be much
cleaner if things can be farmed off via GridEngine.
Thanks,
Bernard
|