Brian W. Barrett wrote:
> On Fri, 13 Jun 2003, Daniel Rohe wrote:
>
>
>>I've just noticed that we have a seemingly identical problem on a
>>32-node Linux Cluster running SuSE-7.3 with lam version 6.5.4 ( at least
>>that's what is sais in the man-page ).
>>
>>We've been using the cluster for quite a while, but the problems have
>>arised only lately ( or maybe we hadn't realized!? ).
>>
>>Anyway, I've read the messages concerning this thread but I'm not sure
>>what we shall do.
>
>
> My guess would be that you have been having performance problems for some
> time, but only recently noticed the problem. As far as I know, there have
> not been any major performance bugs in the recent Linux kernel versions
> (there were some in the early 2.2 series). You could always back out
> previous patches to make sure something didn't change there.
>
> In most cases, performance problems are actually due to the application
> more than anything. TCP is pretty low bandwidth and high latency, so
> applications very sensitive to either of these are going to have lower
> than optimial CPU utilization. You might want to use a profiling tool
> such as XMPI or MPE to see if there are obvious places in your application
> where communication cost can be reduced.
>
> Hope this helps,
Not really. It does seem to be a kernel-related problem. We've updated two nodes to SuSE-8.2 and things look much better.
Daniel
|