LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Bogdan Costescu (bogdan.costescu_at_[hidden])
Date: 2004-06-10 15:16:44


On Wed, 9 Jun 2004, Ricardo Fonseca wrote:

> i.e. lam boots only now and again with the same type of error messages.

Sorry, I don't quite understand what do you mean by "lam boots only
now and again". At the end of the thread, I've proven that it only
boots succesfully on SMP nodes (where slots >= 2).

> Are there any news regarding this?

No... There was also a related query on the SGE list last week, but no
reaction from the SGE developers. I don't know how things evolved for
the new version (6.0 that was supposed to be released these days), but
for 5.3 you need to do first (in qrsh-remote step of the sge-lam
script) a normal rsh/ssh to the remote node and then set environment
variables SGE_ROOT, SGE_CELL, JOB_ID and SGE_TASK_ID before starting
hboot on that node. The second step (qrsh-local) should be kept as is
in the sge-lam script, using 'qrsh -inherit' that will allow the tight
integration.

I'm using this setup since a week or two after the thread ended and it
worked fine so far. But I would like to find another solution, as this
implies free rsh/ssh access to the compute nodes which I'd like to
avoid...

-- 
Bogdan Costescu
IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu_at_[hidden]