- Next message: Jakub Szarlat: "LAM: blcr and mpi"
- Previous message: Fernando Robles Morales: "LAM: help"
- In reply to: sean dettrick: "LAM: LAM7.1.1, OSX, usysv/sysv failing on some nodes, working on others"
- Next in thread: sean dettrick: "Re: LAM: LAM7.1.1, OSX, usysv/sysv failing on some nodes, working on others"
- Reply: sean dettrick: "Re: LAM: LAM7.1.1, OSX, usysv/sysv failing on some nodes, working on others"
- Reply: sean dettrick: "Re: LAM: LAM7.1.1, OSX, usysv/sysv failing on some nodes, working on others"
On Thu, Dec 16, 2004 at 11:20:23PM +0000, sean dettrick wrote:
>I have LAM7.1.1 running on a cluster of dual G5 nodes on OSX.
>On some nodes LAM is working perfectly with usysv, sysv, and tcp RPI's, but
>there are 4 nodes where the usysv and sysv RPI's intermittently fail to
sounds like your app is crashing (or lamd is being untidy) and leaving
shared memory areas lying around. once you run out of shared mem areas
then the app doesn't run.
on Linux you can use 'ipcs' to see these, and 'ipcrm' to delete them.
I presume there's something similat in OSX.
>through PBS (using tm boot) or using rsh boot. Meanwhile tcp always works
>perfectly on all nodes.
typically people call an ipc cleanup script from PBS's epilogue script.
cheers,
robin
- Next message: Jakub Szarlat: "LAM: blcr and mpi"
- Previous message: Fernando Robles Morales: "LAM: help"
- In reply to: sean dettrick: "LAM: LAM7.1.1, OSX, usysv/sysv failing on some nodes, working on others"
- Next in thread: sean dettrick: "Re: LAM: LAM7.1.1, OSX, usysv/sysv failing on some nodes, working on others"
- Reply: sean dettrick: "Re: LAM: LAM7.1.1, OSX, usysv/sysv failing on some nodes, working on others"
- Reply: sean dettrick: "Re: LAM: LAM7.1.1, OSX, usysv/sysv failing on some nodes, working on others"
|