On Thu, 8 Jan 2004, Neil Storer wrote:
> Because IBM's AIX implementation of MPI does not support process
> creation and management, some of our users have asked if it would be
> possible to install LAM-MPI on our system.
By "process creation" do you mean things like fork+exec or thread creation ?
By "management" do you mean signals or something else ?
Have you already tested that LAM-MPI works well with your application ?
There are issues that you have to be aware of (like serialize thread
access to communicator or which signals can be used); they are well
documented on the LAM-MPI website.
> over 50 SMP nodes, each node being a 32-CPU POWER4 p690 server
Nice system :-)
Given the size of the purchase, are you not able to convince IBM to add
the features that you need to their MPI implementation ?
> 1. Do you see any problem in installing LAM-MPI on such a system at that
> level of operating system?
Well, if you want to use all CPUs for one job... that would mean 1600
CPUs. Maybe the llamas can comment on scalability of LAM-MPI (how fast to
lammboot, how fast to start the job at mpirun time, how fast to lamhalt,
if the LAMD-LAMD communication is still fluent); the tens of CPUs that I'm
seeing here per jobs certainly do not count as comparison :-)
> 3. Do you see any reason why using LAM-MPI would not be a good thing to
> do on our system?
Yes, your next point :-)
> 4. IBM's MPI uses LAPI as the underlying protocol and this will be tuned
> and developed for Federation. My understanding is that LAM-MPI would
> have to use TCP/IP on our system
Yes, unless you volunteer to port LAM-MPI to LAPI :-)
> with a possible degradation in performance caused by this.
That's very mildly put :-) I have worked with MPI on LAPI only on old SP2
IBMs, but I had the impression that the scaling was quite good. On the
other hand, MPI over TCP/IP would probably scale very poorly for jobs
using hundreds of CPUs... unless they communicate between nodes only
seldomly - which is not my impression about weather forecasting codes :-)
> Is LAM-MPI "LoadLeveler-aware"
Not that I'm aware of. Integration is there for *PBS and (partly?) SGE.
> can I set up a BATCH job that LoadLeveler will schedule onto a set of
> nodes that will then be used by my (LAM-)MPI program?
That is a different question :-) If LoadLeveler can give a list of nodes
allocated for the job, you can write a short script that converts this to
a LAM-MPI bhosts list that would be used by lamboot. However starting the
LAMDs during lamboot might be a problem... your point 7 below.
> 7. We only allow a very small INTERACTIVE service that is restricted to
> using just a few CPUs in one node. "telnet", "rsh", "ssh" and so on are
> prevented from running on the other nodes as these are reserved for
> BATCH jobs. Since it isn't possible to "rsh" onto the BATCH nodes, would
> this be a problem for LAM-MPI?
Yes. lamboot needs LAMRSH (by default rsh, but can be changed) to start a
LAM daemon on a remote node. This is where the "aware-ness" of the batch
system comes into play; for *PBS, LAM daemons can be started using TM, so
no need for LAMRSH...
> 8. Some of our programs use a hybrid parallel programming padigm of MPI
> across nodes, with OpenMP within the nodes.
What compiler will you be using that supports OpenMP ?
> For this of course we have to use thread-safe libraries. Can
> such an approach be utilised with LAM-MPI too?
http://www.lam-mpi.org/faq/category7.php3#question5
--
Bogdan Costescu
IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu_at_[hidden]
|