Hello,
Just an offchance guess:
Is your /tmp on node n0 not a real local disk? I.e. is it on an
NFS mountpoint, or something out-of-the-ordinary?
LAM keeps various internal files, some special, in /tmp on each node.
What it keeps there has changed from version to version, and there
was a change that made NFS /tmp directories have problems.
--
Tim Mattox - tmattox_at_[hidden] - http://homepage.mac.com/tmattox/
http://aggregate.org/KAOS/ - http://advogato.org/person/tmattox/
On Tue, 27 Jan 2004, Sam Watson wrote:
> Hello. I'm trying to bring up LAM 7.0.3 on a x86 machine without
> much success. When I run either mpirun or lamexec I get a "Bad file descriptor"
> message. I'm also seeing this problem with 6.5.9, but not 6.5.1.
> I've tried to reduce it to the simplest possible example: here it
> is with 6.5.9, running on a single node:
>
> >/proj/lam/6.5.9/bin/lamnodes
> n0 sleet.exa.com:1
> >/proj/lam/6.5.9/bin/lamexec n0 ps
> lamexec (set_stdio): Bad file descriptor
>
> I've run this under strace, and the file descriptor is a socket. I've
> done the same with 6.5.1, and the sequence of system calls is almost
> the same, but the call to sendmsg that works in 6.5.1 fails in 6.5.9
> and 7.0.3.
>
> This is obviously not happening to everyone, and I'm trying to
> figure out what it is about our environment that is causing it.
> I've attached the compressed config.log from the 7.0.3 build.
> Thanks for any help you can provide.
>
>
|