LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Brian Barrett (brbarret_at_[hidden])
Date: 2005-01-16 00:32:48


On Jan 14, 2005, at 7:18 PM, Dale Harris wrote:

> I'm having a problem successfully running lamboot, from lam 7.1.1, on a
> bproc system running version 4.0p8. What I see from lamboot is:
>
> lamboot hosts
>
> LAM 7.1.1/MPI 2 C++/bproc - Indiana University
>
> lamd kernel: problem with socket(): Address family not supported by
> protocol
> ...

This is coming from a call to create a unix domain socket:

         if ((sd_kernel = socket(AF_UNIX, SOCK_STREAM, 0)) < 0)
           lampanic("lamd kernel: problem with socket()");

I'm not really sure how that could be failing with the given error
message. I'm guessing that it's a symptom of the real problem. I know
that's not really helpful, but there really isn't any reason that call
to socket() should fail.

> I was able to do a little strace of this, and see errors like:
>
> getxattr("/bpfs/-1", "bproc.addr", 0xbfffeff4, 16) = 16
> socket(PF_FILE, SOCK_STREAM, 0) = 3
> connect(3, {sa_family=AF_FILE, path="/var/run/.nscd_socket"}, 110) = -1
> ENOENT (No such file or directory)
> close(3) = 0
>
> But that doesn't make much sense to me, looks like it trying to resolve
> a name, perhaps. I assume this is a symptom, but not a cause.

Can you tell when that error message occurs? Perhaps there is
something wrong with the BProc cluster that is causing your errors. Do
other applications run properly on the compute nodes? Also, what
happens if you try to boot with no hostfile (so it just tries to start
on the BProc head node)?

Brian

-- 
   Brian Barrett
   LAM/MPI developer and all around nice guy
   Have an LAM/MPI day: http://www.lam-mpi.org/