LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Bogdan Costescu (bogdan.costescu_at_[hidden])
Date: 2004-05-17 09:51:17


On Mon, 17 May 2004, Terry Frankcombe wrote:

> We think that it's because I'm accessing a heavily loaded NFS server
> causing one or the other of my MPI processes to block and wait for
> the IO to happen, which means that it doesn't participate in the
> message passing like it should. Hence the timeout.)

I have seen the same here some time ago. I can't really blame LAM-MPI,
I see this mostly as a cluster setup problem - the I/O should not take
that much time... But users specifying QM scratch files that reside on
NFS mounted directories have no idea about the consequences (from your
sig I see that you're doing Theoretical Chemistry, so probably using
QM programs). At the first glance, being able to specify the GM
timeout would help somehow, in the sense that jobs will be more likely
to continue, but do you really want to let those CPUs and Myrinet
cards do nothing while the whole job is waiting on I/O ?

-- 
Bogdan Costescu
IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu_at_[hidden]