Hello,
Using an NFS mounted /tmp "shouldn't" cause any problems, but you may as
well rule it out by making /tmp a tmpfs (ramdisk) filesystem. The
directory will only ever be a few hundred kB at most.
I have never experienced this problem on my cluster, which is also
diskless and FC2, except for the kernel, compiler, and some other
things. However, I do not use infiniband, so that could maybe be the
cause...? Maybe you could try using the tcp RPI, if you compiled it? Are
you using a normal lamboot startup with ssh, or something else like TM,
SLURM, or Bproc to start the LAM?
For STDOUT to fail, the lamd's would have to have broken communiation.
But if they did, then you couldn't start any MPI jobs or pass messages.
Does lamnodes show all the nodes in the LAM? Can you ping them all with
tping? Are MPI jobs really starting on the remote nodes? Can you pass a
message? Are you checking the status and returned values of MPI_Comm_rank
and MPI_Comm_size?
Here's another complex question.
On Thu, 23 Jun 2005, Craig Lam wrote:
> Hello,
>
> I've set up a diskless cluster running Fedora Core 3 (modified to
> allow the diskless cluster nodes to start up). When I run an MPI job,
> it seems that stdout does not get directed from remote nodes correctly
> although all local processes' output shows up correctly. Does anyone
> know why this might be?
>
> My system set up is an 8 node dual opteron cluster running in 32-bit
> mode on Linux. Each node has dual infiniband over PCI express
> (although I am only using one interface currently). My configuration
> of MPI is done with "./configure --with-debug --prefix=/opt/lam-7.0.6
> --exec-prefix=/opt/lam-7.0.6 --with-rsh=ssh". The problem exhibits
> itself on both Lam-7.0.6 and Lam-7.1.1 (I have not tried other
> version). My diskless clusters run NFS version 4, and each cluster
> node binds /var/${HOSTNAME}/ to /var and /tmp/${HOSTNAME} to /tmp to
> give each node an individual copy of these directories (would this
> contribute to these problems?)
>
> I must admit that I am a bit stumped.
>
> Thanks for all your thoughts,
> Craig Casey
> craig.mpi_at_[hidden]
>
>
>
------------------------------------------------------------
Anthony Ciani (aciani1_at_[hidden])
Computational Condensed Matter Physics
Department of Physics, University of Illinois, Chicago
http://ciani.phy.uic.edu/~tony
------------------------------------------------------------
|