LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Brian W. Barrett (brbarret_at_[hidden])
Date: 2004-01-08 13:25:34


On Jan 8, 2004, at 10:18 AM, Neil Storer wrote:

> We use IBM's "xlf95_r" compiler with the "-qsmp=omp" flag to
> multi-thread. Basically we use MPI at a very high level in the code
> and then use OpenMP within each of the MPI tasks. In this way we get
> much better scaling than using just MPI.
>
> My understanding is that the codes that will need to use LAM-MPI do
> not use OpenMP (but that could change, hence the query). Also, they
> won't be using all the CPUs (maybe a couple of hundred, not
> thousands).

Users have run Linpack over thousands of processes, but I wouldn't
really recommend doing it with TCP. Any collective operations get
really, really painful. But the hundreds of processes case should
behave acceptably. The shared memory performance on the one node
should be reasonable under AIX, so if you can ensure locality on a
couple of machines, you can minimize the TCP costs.

> One possibility of getting over the "rsh" scalability problem could be
> to use "poe" (which is already integrated with LoadLeveler) as the
> mechanism for starting the LAM daemons:
> e.g.
> ./configure --with-rsh="/usr/bin/poe"
> though this would need to be tested. If this worked OK then we could
> add a few simple commands in the LoadLeveler prolog and epilog scripts
> and LAM-MPI could then be be said to be LoadLeveler-aware.

The rsh scalability problem really doesn't exist for you - LAM only
needs to rsh into a machine once per run (regardless of how many MPI
processes end up on that node). Since you only have 50 nodes and
probably will only run on a small subset of these nodes, that really
isn't an issue. The rsh problem tends to come up in the O(100 node)
sized clusters.

I don't know enough about LoadLeveler to know if using /usr/bin/poe
will work for the RSH client. I would try it and see what happens. If
something doesn't work, please let us know. We are a little short on
AIX developers, but can definitely try to make LoadLeveler and LAM play
nice together.

Hope this helps,

Brian

-- 
   Brian Barrett
   LAM/MPI developer and all around nice guy
   Have a LAM/MPI day: http://www.lam-mpi.org/