LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Craig Lam (craig.mpi_at_[hidden])
Date: 2005-06-25 08:15:03


Jeff,

Thanks for the reply. It seems to happen for every case. I've got a
simulator that prints out a bunch of stuff as an extreme case, and
here is another example of a 'hello world' type application. Source
code and output shown below. (Summary, every node should print "Comm
rank %d reporting", mpi_comm_rank, but only a single one does (unless
I run more mpi processes than nodes, when just the local nodes run).

__________________________
Source code:
#include <mpi.h>

int main(int argc, char* argv[])
{
  int mpi_comm_rank;
  int mpi_comm_size;

  MPI_Init(&argc, &argv);

  MPI_Comm_size(MPI_COMM_WORLD, &mpi_comm_size);
  MPI_Comm_rank(MPI_COMM_WORLD, &mpi_comm_rank);

  printf("Comm rank %d reporting.\n", mpi_comm_rank);

  MPI_Finalize();
}

_________________
OUTPUT
---------------------
[craig_at_c1 mpi_test]$ mpirun -np 6 mpi_test
Comm rank 0 reporting.
[craig_at_c1 mpi_test]$

Any ideas at all are greatly appreciated.

Thanks,
Craig Casey,
craig.mpi_at_[hidden]

On 6/25/05, Jeff Squyres <jsquyres_at_[hidden]> wrote:
> Can you give a concrete example of this?
>
> Do you have a lot of stdout from the processes running on the nodes, or
> just a little output (and then program termination)?
>
> If it's just a little output, you might want to put explicit fflush()
> statements in your application (I'm assuming that this is a C
> application?).
>
> On Jun 23, 2005, at 11:17 PM, Craig Lam wrote:
>
> > Hello,
> >
> > I've set up a diskless cluster running Fedora Core 3 (modified to
> > allow the diskless cluster nodes to start up). When I run an MPI job,
> > it seems that stdout does not get directed from remote nodes correctly
> > although all local processes' output shows up correctly. Does anyone
> > know why this might be?
> >
> > My system set up is an 8 node dual opteron cluster running in 32-bit
> > mode on Linux. Each node has dual infiniband over PCI express
> > (although I am only using one interface currently). My configuration
> > of MPI is done with "./configure --with-debug --prefix=/opt/lam-7.0.6
> > --exec-prefix=/opt/lam-7.0.6 --with-rsh=ssh". The problem exhibits
> > itself on both Lam-7.0.6 and Lam-7.1.1 (I have not tried other
> > version). My diskless clusters run NFS version 4, and each cluster
> > node binds /var/${HOSTNAME}/ to /var and /tmp/${HOSTNAME} to /tmp to
> > give each node an individual copy of these directories (would this
> > contribute to these problems?)
> >
> > I must admit that I am a bit stumped.
> >
> > Thanks for all your thoughts,
> > Craig Casey
> > craig.mpi_at_[hidden]
> >
> > _______________________________________________
> > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> >
>
> --
> {+} Jeff Squyres
> {+} jsquyres_at_[hidden]
> {+} http://www.lam-mpi.org/
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>