LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2005-06-14 11:29:36


On Jun 13, 2005, at 4:08 PM, Pierre Valiron wrote:

>> What might be a better workaround for you would be, instead of sleep,
>> test when the file "lam-io-socket" disappears from the LAM session
>> directory. This is the last file that is removed before the directory
>> itself is deleted (and the directory will only be deleted if it is
>> empty). Normally, there will never be any other files in the session
>> directory and the whole directory disappears, but looking for the
>> disappearance of that last file will definitely cover you in all
>> cases.
>
> Probably yes, but this is not elegant nor easy to implement in a
> script...
> And the machine would stay inactive until these files get deleted,
> which
> is bad if many short LAM jobs are to be processed.

I actually just committed changes to lamhalt to effect this behavior by
default (i.e., wait for the socket to go away). It'll be in 7.1.2 and
tomorrow's nightly tarball.

> I have already given the answer: using LAM_MPI_SESSION_SUFFIX to label
> a
> unique lam universe lying on top of a permanent filesystel (such as
> /tmp).
> In this case the files are deleted asynchronously by lamhalt.

> And nothing prevents starting immediately another lam universe unsing a
> new unique LAM_MPI_SESSION_SUFFIX.

I think that all I'm trying to say here is that you don't need to use
LAM_MPI_SESSION_SUFFIX -- assigning a unique value to TMPDIR would do
the same thing.

Also, removing the directory immediately after exiting [old] lamhalt --
regardless of whether you use TMPDIR or LAM_MPI_SESSION_SUFFIX -- will
be problematic.

> The only point I am insisting is to put these considerations somewhere
> in
> the doc, because I have already wasted a whole day to understand this
> strange behaviour and it would be nice if I could save the same horror
> story to another lam user !

Completely understandable, and I thank you for bringing it to our
attention (you're actually cited in the HISTORY file now :-) ). I hope
the updated lamhalt will prevent others from running into this problem,
and we now include a lamhalt man page (although I just updated it with
all this new information, no one will realize that it's updated because
the man page was mistakenly left out of the tarball before).

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/