Jeff,
Thanks for the reply. It seems to happen for every case. I've got a
simulator that prints out a bunch of stuff as an extreme case, and
here is another example of a 'hello world' type application. Source
code and output shown below. (Summary, every node should print "Comm
rank %d reporting", mpi_comm_rank, but only a single one does (unless
I run more mpi processes than nodes, when just the local nodes run).
__________________________
Source code:
#include <mpi.h>
int main(int argc, char* argv[])
{
int mpi_comm_rank;
int mpi_comm_size;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &mpi_comm_size);
MPI_Comm_rank(MPI_COMM_WORLD, &mpi_comm_rank);
printf("Comm rank %d reporting.\n", mpi_comm_rank);
MPI_Finalize();
}
_________________
OUTPUT
---------------------
[craig_at_c1 mpi_test]$ mpirun -np 6 mpi_test
Comm rank 0 reporting.
[craig_at_c1 mpi_test]$
Any ideas at all are greatly appreciated.
Thanks,
Craig Casey,
craig.mpi_at_[hidden]
On 6/25/05, Jeff Squyres <jsquyres_at_[hidden]> wrote:
> Can you give a concrete example of this?
>
> Do you have a lot of stdout from the processes running on the nodes, or
> just a little output (and then program termination)?
>
> If it's just a little output, you might want to put explicit fflush()
> statements in your application (I'm assuming that this is a C
> application?).
>
> On Jun 23, 2005, at 11:17 PM, Craig Lam wrote:
>
> > Hello,
> >
> > I've set up a diskless cluster running Fedora Core 3 (modified to
> > allow the diskless cluster nodes to start up). When I run an MPI job,
> > it seems that stdout does not get directed from remote nodes correctly
> > although all local processes' output shows up correctly. Does anyone
> > know why this might be?
> >
> > My system set up is an 8 node dual opteron cluster running in 32-bit
> > mode on Linux. Each node has dual infiniband over PCI express
> > (although I am only using one interface currently). My configuration
> > of MPI is done with "./configure --with-debug --prefix=/opt/lam-7.0.6
> > --exec-prefix=/opt/lam-7.0.6 --with-rsh=ssh". The problem exhibits
> > itself on both Lam-7.0.6 and Lam-7.1.1 (I have not tried other
> > version). My diskless clusters run NFS version 4, and each cluster
> > node binds /var/${HOSTNAME}/ to /var and /tmp/${HOSTNAME} to /tmp to
> > give each node an individual copy of these directories (would this
> > contribute to these problems?)
> >
> > I must admit that I am a bit stumped.
> >
> > Thanks for all your thoughts,
> > Craig Casey
> > craig.mpi_at_[hidden]
> >
> > _______________________________________________
> > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> >
>
> --
> {+} Jeff Squyres
> {+} jsquyres_at_[hidden]
> {+} http://www.lam-mpi.org/
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
|