This is not the best way to do things. There are two better ways - the
TMPDIR with a small tmpfs (there is actually a LAM_MPI_SESSION_PREFIX
option to override the TMPDIR option, so you could use the tmpfs only for
LAM's session directories, which are normally pretty small) or for the LAM
authors to make sure we don't try to rmdir the directory until the files
are all actually removed from there.
We were under the impression that we had managed to clean up after
ourselves in 6.5.9, but it is possible we forgot something. If you are
using 6.5.9, could you please let us know what remains in the tmpdir after
a lamhalt? If you are using a version of LAM previous to 6.5.9, could you
please try 6.5.9?
Moving the session directory is probably not the best way to deal with the
problem - the right way is really to make sure that there are no open
files when tkill calls rmdir(), which should be doable.
Brian
On Wed, 30 Apr 2003, Jerome BENOIT wrote:
> Hello,
>
> thanks for your insight.
>
> The `$TMPDIR' work around is certainly a good idea,
> but I guess that the problem will remain the same
> on diskless computers.
>
> Nevertheless, a trick could be applied:
> the idea is to rename
> the `/tmp/lam-<USER>@<HOST>' directory
> (intead of trying to remove it)
> and to remove the renamed directory
> at the next `lamboot'.
> In fact if you replace `rmdir' by `rename' in the
> `LAM/tools/tkill/tkill.c' C source file
> the `lamboot/lamhalt' cycle works on [my local]
> diskless computers.
>
> It is really a good idea ?
>
> May I send a clean patch file ?
>
> Thanks,
> Jerome
>
>
> Timothy I Mattox wrote:
> > Hello,
> > Sorry I didn't notice your e-mail earlier. We ran into that exact issue
> > on KLAT2, and I debugged it (at least to my satisfaction) back in an
> > earlier version of LAM. However, my patches didn't solve
> > the problem in all cases. Basically the problem is that NFS will rename
> > a file rather than delete it if there is a live file-descriptor for the
> > file (i.e. it is still "open" by something). The details are beyond
> > the scope of this e-mail, but suffice to say, there wasn't an elegant
> > solution for patching LAM as it was then (summer of 2001).
> >
> > My current best solution for diskless nodes with LAM, is to use tmpfs
> > to create a RAM based /tmp. It's not ideal, since /tmp would now be
> > very limited in size, and any files there would directly compete with your
> > parallel program for real RAM space in the node.
> > In RedHat 7.x and later, you can look at the /dev/shm entry in /etc/fstab
> > for an example use of tmpfs. You'll need to add a line to /etc/fstab
> > that looks like this:
> > none /tmp tmpfs defaults 0 0
> >
> > I don't know if the upcoming 7.0 version really fixes this issue,
> > but at least with the option to change the location of those special
> > files, more elegant solutions might present themselves.
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
--
Brian Barrett
LAM/MPI developer and all around nice guy
Have a LAM/MPI day: http://www.lam-mpi.org/
|