LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Robin Humble (rjh_at_[hidden])
Date: 2004-12-16 18:49:49


On Thu, Dec 16, 2004 at 11:20:23PM +0000, sean dettrick wrote:
>I have LAM7.1.1 running on a cluster of dual G5 nodes on OSX.
>On some nodes LAM is working perfectly with usysv, sysv, and tcp RPI's, but
>there are 4 nodes where the usysv and sysv RPI's intermittently fail to

sounds like your app is crashing (or lamd is being untidy) and leaving
shared memory areas lying around. once you run out of shared mem areas
then the app doesn't run.
on Linux you can use 'ipcs' to see these, and 'ipcrm' to delete them.
I presume there's something similat in OSX.

>through PBS (using tm boot) or using rsh boot. Meanwhile tcp always works
>perfectly on all nodes.

typically people call an ipc cleanup script from PBS's epilogue script.

cheers,
robin