LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Andras Balogh (abalogh_at_[hidden])
Date: 2003-09-29 10:26:38


Thanks for all the responses.

This is a Beowulf cluster with shared file system.
>From the help I got I suspect it was a cache problem
with the cluster.
Right now I cannot recreate the problem.

On Sun, 28 Sep 2003, Brian Barrett wrote:

> LAM just calls fork()/exec() out on the remote nodes. We used to have
> the problems you describe when all the LAM development workstations
> used AFS, which did heavy client-side caching. Of course, by the time
> you logged into the node to figure out what was going wrong, the cache
> was invalidated and everything worked as expected.
>
> If you are having repeated problems and are on a shared filesystem, you
> might want to talk to your systems administrator. It sounds like you
> may be having some problems on your machine. If you aren't using a
> common filesystem, you might want to try using the -s option to mpirun.
> Having mpirun push the binary out may be slightly less error-prone
> than doing it by hand.
>
> Either way, this isn't a LAM problem, but just some of the pain of
> working on clusters...
>
> Brian
>
Andras