LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Brian Barrett (brbarret_at_[hidden])
Date: 2006-06-07 23:58:59


On May 30, 2006, at 10:08 AM, YoungHui Amend wrote:

> I'm using the LAM/MPI 7.1.2. I have the lam daemons running.
> My main (master) process seems to be failing at MPI_Init. Here is
> my command and output.
> I've added some debug fprintf in the source code. It's failing at
> kinit() but my debug printf are
> not coming out of that function. I don't know why. Any idea why
> kinit is failing??????

> IN lam_linit - calling kenter /sbox/yamend/r33/amd64_linux24/64/bin/
> TWTgen
> IN lam_linit - before kenter errno = 0
> IN kenter - calling kinit
> IN lam_linit - errno = 471
> IN lam_linit - returning errno=1239

It looks like kinit is failing because it couldn't contact the kernel
process. This could happen if you are trying to start a whole lot of
processes (more than 64) on one node. Other than that, I'm not
really sure why it could be failing at that point. Could you try
stepping through the code with a debugger to figure out where the
error first occurs? An "easy" way to do this would be to start your
application in gdb running in an xterm:

   mpirun -np 1 xterm -e gdb ./myapp

Can you include some more information about the platform you are
using and which compilers you used to build LAM?

Thanks,

Brian

-- 
   Brian Barrett
   LAM/MPI developer and all around nice guy
   Have a LAM/MPI day: http://www.lam-mpi.org/