LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: sean dettrick (sdettrick_at_[hidden])
Date: 2004-12-16 20:46:30


>From: Robin Humble <rjh_at_[hidden]>
>Reply-To: General LAM/MPI mailing list <lam_at_[hidden]>
>To: General LAM/MPI mailing list <lam_at_[hidden]>
>Subject: Re: LAM: LAM7.1.1, OSX, usysv/sysv failing on some
nodes,working on others
>Date: Thu, 16 Dec 2004 18:49:49 -0500
>
>On Thu, Dec 16, 2004 at 11:20:23PM +0000, sean dettrick wrote:
> >I have LAM7.1.1 running on a cluster of dual G5 nodes on OSX.
> >On some nodes LAM is working perfectly with usysv, sysv, and tcp
RPI's, but
> >there are 4 nodes where the usysv and sysv RPI's intermittently
fail to
>
>sounds like your app is crashing (or lamd is being untidy) and leaving
>shared memory areas lying around. once you run out of shared mem areas
>then the app doesn't run.
>on Linux you can use 'ipcs' to see these, and 'ipcrm' to delete them.
>I presume there's something similat in OSX.

Thanks, that seems to have cleared it up.

There's no native ipcs/ipcrm, but I downloaded one as recommended by the
Open Darwin Ports project (http://darwinports.opendarwin.org/). It works
nicely. I can see now that the presence of stale semaphores/shared memory
areas is preventing my usysv jobs from working.

> >through PBS (using tm boot) or using rsh boot. Meanwhile tcp
always works
> >perfectly on all nodes.
>
>typically people call an ipc cleanup script from PBS's epilogue script.

I'll play around with that.
Thanks again,
Sean