LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Rajeev Thakur (thakur_at_[hidden])
Date: 2005-06-09 12:33:27


Daniel,
       Are you using the subarray datatype? If so, there is a known
performance problem with it if you are using MPICH2, but not with MPICH-1.
So you can try using MPICH-1 for now, and it should work fine. The problem
with MPICH2 will be fixed soon, but not in the upcoming release of MPICH2
1.0.2 that will happen in the next day or so. There will perhaps be a
separate release of ROMIO that will follow.

I am not sure if the subarray type will work at all with LAM, because LAM
(correctly) presents it as a first-class datatype, whereas the older version
of ROMIO that is in LAM expects it to be built out of MPI-1 types
internally. This will also be fixed soon.

Rajeev
 

> Date: Thu, 9 Jun 2005 08:20:32 +0200
> From: daniel.egloff_at_[hidden]
> Subject: LAM: Performance of MPI-IO / ROMIO / HDF5
> To: jsquyres_at_[hidden], lam_at_[hidden]
> Cc: trueter_at_[hidden], michael.christian.gauckler_at_[hidden],
> til-zkb_at_[hidden]
> Message-ID:
>
> <OFC312CFE7.027052BB-ONC125701B.001FEE91-C125701B.0022D734_at_[hidden]>
> Content-Type: text/plain; charset=iso-8859-2
>
> Dear LAM list
>
> We have severe performance problems with MPI-IO / ROMIO on a 100 CPU
> Linux Intel Cluster with GigE interconnects and a reasonable switch.
>
> We tried several setups:
>
> LAM / ROMIO with pvfs2 (has some limitations with MPI-2
> datatypes such as subarrays in combination mit MPI_File_set_view)
> MPICH2 / ROMIO with pvfs2
>
> Both setups face similar problems, most likely because the parallel
> file io layer is ROMIO for both setups.
>
> I guess that our "problem" is rather standard:
>
> Simulation phase:
> Write a 4 dimensional cube to a file.
> The cube is generated by different processes in blocks,
> The blocks are generated by slicing the cube along the last
> dimension, i.e. the dimension with stride 1 (C array order)
>
> Aggregation phase:
> Read back the cube from file, this time sliced along the first
> dimension, i.e. the dimension with the larges stride. (C array order
> Every process reads a slice which is now just a threedimensional
> subcube and processes it, until no more slices are to be processed.
> (fortuantely one slice fits entierly in memory)
>
> The dimensionality of the cube is 1000 x50x50x1000000 doubles = 2 x
> 10^13 bytes or in the range of 20 Tera bytes.
>
> Questions:
> -Is such a problem doable with MPI-IO? Our tests indicate that
> the "data sieving optimizaiton" done by ROMIO
> leads to an unacceptable performance degradation. A lot of
> communication happends inside MPI,
> probably done by ROMIO to do some "optimization"? The
> theoretical IO bandwidth of our infrastructure is not really stressed.
> - What IO bandwidth can we expect?
> - Has the community some indications about possible IO
> bandwidth?
> - Has the community some experience with other setups, i.e.
> other parallel file systems, or ev. commerical MPI implementations,
> Lustre, GPFS....?
> - Shall we do it entierly different, i.e. another memory layout,
> or drop MPI-IO and do it "by hand" e.g. by sending the chunks of data
> to suitable storage locations via MPI send/recv, thereby
> bypassing MPI-IO / ROMIO?
>
> Best regards,
>
> Daniel Egloff
> Z|rcher Kantonalbank, VFEF
> Head Financial Computing
> Josefstrasse 222, 8005 Z|rich
> Tel. +41 (0) 44 292 45 33, Fax +41 (0) 44 292 45 93
> Briefadresse: Postfach, 8010 Z|rich, http://www.zkb.ch