LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jerome BENOIT (jgmbenoit_at_[hidden])
Date: 2003-04-30 11:28:59


I am running LAM 6.5.9 on a diskless network
(i.e. the whole disk is mount througg NFS
to a unique disk).

Brian W. Barrett wrote:
> This is not the best way to do things. There are two better ways - the
> TMPDIR with a small tmpfs (there is actually a LAM_MPI_SESSION_PREFIX
> option to override the TMPDIR option, so you could use the tmpfs only for
> LAM's session directories, which are normally pretty small) or for the LAM
> authors to make sure we don't try to rmdir the directory until the files
> are all actually removed from there.

I read it in the `tkill.c' source file.
I trace with `print' lines to figure out what is going on:
whereas the step
"/* Did we remove everything successfully? */"
is passed,
the step
"/* Move back one directory so that we can remove the target dir */"
stuks `tkill':
the `rmdir' (line 349) return `ENOTEMPTY'.
A check after that `tkill' quitted,
shows that the directory `/tmp/lam-<USER>@<HOSTNAME>' is empty:
I can remove by hand with the command line
`rmdir /tmp/lam-<USER>@<HOSTNAME>'

Now if I introduce a (new) loop to stuck the program at this place
[`else if (errno == ENOTEMPTY) { sleep(2); continue; }'
between the line 352 and 353]:
the loop seem infinite;
the directory `/tmp/lam-<USER>@<HOSTNAME>' is empty (according to `ls');
BUT I cannot remove it with `rmdir'
(who says that the directory is not empty).

I hope that helps.

Jerome

>
> We were under the impression that we had managed to clean up after
> ourselves in 6.5.9, but it is possible we forgot something. If you are
> using 6.5.9, could you please let us know what remains in the tmpdir after
> a lamhalt? If you are using a version of LAM previous to 6.5.9, could you
> please try 6.5.9?
>
> Moving the session directory is probably not the best way to deal with the
> problem - the right way is really to make sure that there are no open
> files when tkill calls rmdir(), which should be doable.

Clearly "moving instead of removing" the directory is a dirty trick.

>
> Brian
>
>
> On Wed, 30 Apr 2003, Jerome BENOIT wrote:
>
>
>>Hello,
>>
>>thanks for your insight.
>>
>>The `$TMPDIR' work around is certainly a good idea,
>>but I guess that the problem will remain the same
>>on diskless computers.
>>
>>Nevertheless, a trick could be applied:
>>the idea is to rename
>>the `/tmp/lam-<USER>@<HOST>' directory
>>(intead of trying to remove it)
>>and to remove the renamed directory
>>at the next `lamboot'.
>>In fact if you replace `rmdir' by `rename' in the
>>`LAM/tools/tkill/tkill.c' C source file
>>the `lamboot/lamhalt' cycle works on [my local]
>>diskless computers.
>>
>>It is really a good idea ?
>>
>>May I send a clean patch file ?
>>
>>Thanks,
>>Jerome
>>
>>
>>Timothy I Mattox wrote:
>>
>>>Hello,
>>>Sorry I didn't notice your e-mail earlier. We ran into that exact issue
>>>on KLAT2, and I debugged it (at least to my satisfaction) back in an
>>>earlier version of LAM. However, my patches didn't solve
>>>the problem in all cases. Basically the problem is that NFS will rename
>>>a file rather than delete it if there is a live file-descriptor for the
>>>file (i.e. it is still "open" by something). The details are beyond
>>>the scope of this e-mail, but suffice to say, there wasn't an elegant
>>>solution for patching LAM as it was then (summer of 2001).
>>>
>>>My current best solution for diskless nodes with LAM, is to use tmpfs
>>>to create a RAM based /tmp. It's not ideal, since /tmp would now be
>>>very limited in size, and any files there would directly compete with your
>>>parallel program for real RAM space in the node.
>>>In RedHat 7.x and later, you can look at the /dev/shm entry in /etc/fstab
>>>for an example use of tmpfs. You'll need to add a line to /etc/fstab
>>>that looks like this:
>>>none /tmp tmpfs defaults 0 0
>>>
>>>I don't know if the upcoming 7.0 version really fixes this issue,
>>>but at least with the option to change the location of those special
>>>files, more elegant solutions might present themselves.
>>
>>
>>_______________________________________________
>>This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
>
>