LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Brian W. Barrett (brbarret_at_[hidden])
Date: 2001-12-29 14:12:33


On Thu, 20 Dec 2001, Scott Morton wrote:

> >Can you provide some information about your setup? What operating system
> >are you using? How many nodes is your setup?
>
> These dual P4 nodes are running Red Hat 7.1 with 2.4.* kernels. The
> exact version varied from 2.4.9 to 2.4.13. I say varied because our
> sysadmin suspects that my problem could be due to a memory problem
> with the older (pre 2.4.10) kernels; so he upgraded all of the nodes
> to 2.4.13 and my code ran to completion. This is hardly conclusive,
> since my failure rate was only about 30%. But time will tell.
>
> If the trace back from my hung lamd doesn't give you any ideas, I
> suspect that the best way to proceed is for me to make a few more runs
> and see if the problem still occurs. If not, we can blame it on the
> kernel. If so, then we revisit it. Does that make sense to you?

It is possible that a kernel bug caused the problems you were seeing.
Running in lamd mode can be, well, abusive to the OS and linux isn't know
for its super-stable networking implementations.

Let us know if you are still having problems and I can investigate
further.

Thanks,

Brian

-- 
  Brian Barrett
  LAM/MPI developer and all around nice guy
  Have a LAM/MPI day: http://www.lam-mpi.org/
_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/