Hello,
thanks for your insight.
The `$TMPDIR' work around is certainly a good idea,
but I guess that the problem will remain the same
on diskless computers.
Nevertheless, a trick could be applied:
the idea is to rename
the `/tmp/lam-<USER>@<HOST>' directory
(intead of trying to remove it)
and to remove the renamed directory
at the next `lamboot'.
In fact if you replace `rmdir' by `rename' in the
`LAM/tools/tkill/tkill.c' C source file
the `lamboot/lamhalt' cycle works on [my local]
diskless computers.
It is really a good idea ?
May I send a clean patch file ?
Thanks,
Jerome
Timothy I Mattox wrote:
> Hello,
> Sorry I didn't notice your e-mail earlier. We ran into that exact issue
> on KLAT2, and I debugged it (at least to my satisfaction) back in an
> earlier version of LAM. However, my patches didn't solve
> the problem in all cases. Basically the problem is that NFS will rename
> a file rather than delete it if there is a live file-descriptor for the
> file (i.e. it is still "open" by something). The details are beyond
> the scope of this e-mail, but suffice to say, there wasn't an elegant
> solution for patching LAM as it was then (summer of 2001).
>
> My current best solution for diskless nodes with LAM, is to use tmpfs
> to create a RAM based /tmp. It's not ideal, since /tmp would now be
> very limited in size, and any files there would directly compete with your
> parallel program for real RAM space in the node.
> In RedHat 7.x and later, you can look at the /dev/shm entry in /etc/fstab
> for an example use of tmpfs. You'll need to add a line to /etc/fstab
> that looks like this:
> none /tmp tmpfs defaults 0 0
>
> I don't know if the upcoming 7.0 version really fixes this issue,
> but at least with the option to change the location of those special
> files, more elegant solutions might present themselves.
|