Brian,
>> I thought that all the changes between 6.5.1 and 6.5.6 were of a
>> minor sort, so I hadn't upgraded. Should I?
>If nothing else, upgrading would make debugging easier - at least we will
>be using the same code base if this takes some work.
Certainly. I will.
>Can you provide some information about your setup? What operating system
>are you using? How many nodes is your setup?
These dual P4 nodes are running Red Hat 7.1 with 2.4.* kernels. The
exact version varied from 2.4.9 to 2.4.13. I say varied because our
sysadmin suspects that my problem could be due to a memory problem
with the older (pre 2.4.10) kernels; so he upgraded all of the nodes
to 2.4.13 and my code ran to completion. This is hardly conclusive,
since my failure rate was only about 30%. But time will tell.
If the trace back from my hung lamd doesn't give you any ideas, I
suspect that the best way to proceed is for me to make a few more runs
and see if the problem still occurs. If not, we can blame it on the
kernel. If so, then we revisit it. Does that make sense to you?
Scott
_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/
|