Has any one seen some good tutorials on how to tune the performance of
parallel code using MPI, like some programming guidelines? I am particularly
interested in how to optimize the performance at the application
level(instead of at the program level, like code transformation).
Thanks,
Guanhua
|