LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2005-06-13 11:51:28


On Jun 13, 2005, at 12:07 PM, Pierre Valiron wrote:

>>> If the daemons are attached to a volatile directory, lamhalt comes
>>> into
>>> trouble when the directory is destroyed. If the deamons are attached
>>> to
>>> a permanent one, then lamhalt gets all the time he needs to kill the
>>> lam
>>> universe and remove the corresponding files.
>>
>> Unfortunately, I'm still confused. :-)
>>
>> Are you saying that you're removing $TMPDIR before running lamhalt?
>> Or
>> perhaps removing it immediately after lamhalt completes?
>>
> Yes, I remove $TMPDIR just *after* lamhalt completes. Before would be
> just silly of course...

Understood, but I have to ask. :-)

> I have tried some sleep values, unfortunately nothing seems very
> reliable because the machine may be very busy with other calculations,
> especially if large I/O are underway...

Erf. Gotcha.

> Thus I suggest to write some specific caveat in the man page and doc
> that lamhalt is asynchronous and thus that the sockets should not be
> deleted by the user but rather by lam itself, and thus to rely on the
> LAM_MPI_SESSION_SUFFIX to generate several unrelated LAM universes on a
> permanent LAM_MPI_SESSION_PREFIX.

What might be a better workaround for you would be, instead of sleep,
test when the file "lam-io-socket" disappears from the LAM session
directory. This is the last file that is removed before the directory
itself is deleted (and the directory will only be deleted if it is
empty). Normally, there will never be any other files in the session
directory and the whole directory disappears, but looking for the
disappearance of that last file will definitely cover you in all cases.

Would that work?

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/