Hi all,
I am worried about possible memory requirements for collective
operations, namely MPI_Bcast and MPI_Reduce, when very large buffers are
broadcasted or reduced.
In my application the buffer size is close to the available core per
processor.
I have found no problems so far using LAM/MPI over ethernet. However I
have run into big trouble on some vendor's MPI (in particular on IBM
supercomputers) when using the dedicated Colony or Federation switches.
I have also experienced some weird problems with LAM/MPI and myrinet on
itaniums, but it was harder to pinpoint the trouble.
A simple cure is to chop the collective operation in smaller chunks. But
there is no obvious choice for the chunk size... In addition I guess it
would be better to leave the MPI implementation perform its own
collective optimization and do the best job with available core memory
and latency and throughput of the available interconnect.
Do you specifically take care of such issues within LAM/MPI 7.1.1 with
the various supported interconnects ? What do you plan with OpenMPI ?
Thanks for any hints.
--
Soutenez le mouvement SAUVONS LA RECHERCHE :
http://recherche-en-danger.apinc.org/
_/_/_/_/ _/ _/ Dr. Pierre VALIRON
_/ _/ _/ _/ Laboratoire d'Astrophysique
_/ _/ _/ _/ Observatoire de Grenoble / UJF
_/_/_/_/ _/ _/ BP 53 F-38041 Grenoble Cedex 9 (France)
_/ _/ _/ http://www-laog.obs.ujf-grenoble.fr
_/ _/ _/ mail: Pierre.Valiron_at_[hidden]
_/ _/ _/ Phone: +33 4 7651 4787 Fax: +33 4 7644 8821
_/ _/_/
|