LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: daniel.egloff_at_[hidden]
Date: 2005-06-09 01:20:32


Dear LAM list

We have severe performance problems with MPI-IO / ROMIO on a 100 CPU
Linux Intel Cluster with GigE interconnects and a reasonable switch.

We tried several setups:

      LAM / ROMIO with pvfs2 (has some limitations with MPI-2
datatypes such as subarrays in combination mit MPI_File_set_view)
      MPICH2 / ROMIO with pvfs2

Both setups face similar problems, most likely because the parallel
file io layer is ROMIO for both setups.

I guess that our "problem" is rather standard:

      Simulation phase:
      Write a 4 dimensional cube to a file.
      The cube is generated by different processes in blocks,
      The blocks are generated by slicing the cube along the last
dimension, i.e. the dimension with stride 1 (C array order)

      Aggregation phase:
      Read back the cube from file, this time sliced along the first
dimension, i.e. the dimension with the larges stride. (C array order
      Every process reads a slice which is now just a threedimensional
subcube and processes it, until no more slices are to be processed.
      (fortuantely one slice fits entierly in memory)

The dimensionality of the cube is 1000 x50x50x1000000 doubles = 2 x
10^13 bytes or in the range of 20 Tera bytes.

Questions:
      -Is such a problem doable with MPI-IO? Our tests indicate that
the "data sieving optimizaiton" done by ROMIO
        leads to an unacceptable performance degradation. A lot of
communication happends inside MPI,
        probably done by ROMIO to do some "optimization"? The
theoretical IO bandwidth of our infrastructure is not really stressed.
      - What IO bandwidth can we expect?
      - Has the community some indications about possible IO
bandwidth?
      - Has the community some experience with other setups, i.e.
other parallel file systems, or ev. commerical MPI implementations,
Lustre, GPFS....?
      - Shall we do it entierly different, i.e. another memory layout,
or drop MPI-IO and do it "by hand" e.g. by sending the chunks of data
        to suitable storage locations via MPI send/recv, thereby
bypassing MPI-IO / ROMIO?

Best regards,

Daniel Egloff
Zürcher Kantonalbank, VFEF
Head Financial Computing
Josefstrasse 222, 8005 Zürich
Tel. +41 (0) 44 292 45 33, Fax +41 (0) 44 292 45 93
Briefadresse: Postfach, 8010 Zürich, http://www.zkb.ch
___________________________________________________________________

Disclaimer:

Diese Mitteilung ist nur fuer die Empfaengerin / den Empfaenger
bestimmt.

Fuer den Fall, dass sie von nichtberechtigten Personen empfangen wird,
bitten wir diese hoeflich, die Mitteilung an die ZKB zurueckzusenden
und anschliessend die Mitteilung mit allen Anhaengen sowie allfaellige
Kopien zu vernichten bzw. zu loeschen. Der Gebrauch der Information
ist verboten.

This message is intended only for the named recipient and may contain
confidential or privileged information.

If you have received it in error, please advise the sender by return
e-mail and delete this message and any attachments. Any unauthorised
use or dissemination of this information is strictly prohibited.