LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Brian Barrett (brbarret_at_[hidden])
Date: 2005-06-13 08:02:40


On Jun 12, 2005, at 5:08 AM, dhruva shree agarwal wrote:

> I am having a 4 nodes cluster of IBM Power5 with 4procs/node.When I
> try to run HPL through LAM/MPI on this cluster,I notice that the
> processes are sleeping for a very long time and the performance
> achieved is very less.
> I am not able to understand why this is happening whereas everything
> is fine for ch_p4mpd device of mpich and I am getting good
> performance.
> ::> If I give only one process per node means only 4 processes on the
> cluster then also everything is fine and for 4 processes(only one
> node occupied) on a single node everything is fine too.
> ::> My hostfile also has the lines like
> cnode21 cpu=4
I'm afraid there really isn't enough information to understand what
is going on with your setup. Which transport are you using - TCP,
GM, or one of the shared memory transports? Unless you are
explicitly setting the transport, you can run " laminfo -version rpi
full" to get a list of transports available on your build. If shared
memory (sysv or usysv RPI) are not listed, you probably want them to
be in order to get better performance.

The only other thing I can recommend is to use a profiling tool to
get some idea of what your application is doing. Or, if it's
spending most of it's time sleeping, use gdb to find out where it is
sleeping.

Posting all the results of laminfo would be very useful in
determining where your performance issues might lie.

Thanks,

Brian