Sorry for the delay; this turned out to be quite a hectic week.
I'm wondering if the problem is your seemingily very old Linux kernel
(2.4.2-2smp). We've had a lot of problems with the file descriptor
passing code in LAM/MPI -- it's one of the most non-portable concepts in
Unix -- and it's changed in how it's supported in Linux a few times over
the years. :-(
We thought that the latest round of changes we made would be the universal
fix (LAM now supports 4 ways to do fd passing). I'm wondering if LAM's
configure script is either picking the wrong one, or the fd passing
support in Linux 2.4.2 is so different that we don't have it in LAM
(anymore).
>From your config.log, it looks like LAM's configure script determined that
you have POSIX 1g fd passing. It's a little odd that it would succeed at
configure time but then fail at run time -- we literally use the same code
in configure as we do in the actual LAM library (we compile
share/etc/srfd.c in the configure test).
So let's start with the whacked-out theories first -- are you
configuring/compiling LAM on a different system than you're actually
running your MPI applications?
On Tue, 27 Jan 2004, Sam Watson wrote:
> Hello. I'm trying to bring up LAM 7.0.3 on a x86 machine without
> much success. When I run either mpirun or lamexec I get a "Bad file descriptor"
> message. I'm also seeing this problem with 6.5.9, but not 6.5.1.
> I've tried to reduce it to the simplest possible example: here it
> is with 6.5.9, running on a single node:
>
> >/proj/lam/6.5.9/bin/lamnodes
> n0 sleet.exa.com:1
> >/proj/lam/6.5.9/bin/lamexec n0 ps
> lamexec (set_stdio): Bad file descriptor
>
> I've run this under strace, and the file descriptor is a socket. I've
> done the same with 6.5.1, and the sequence of system calls is almost
> the same, but the call to sendmsg that works in 6.5.1 fails in 6.5.9
> and 7.0.3.
>
> This is obviously not happening to everyone, and I'm trying to
> figure out what it is about our environment that is causing it.
> I've attached the compressed config.log from the 7.0.3 build.
> Thanks for any help you can provide.
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|