LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: BOYRIE Fabrice (boyrie_at_[hidden])
Date: 2006-02-15 11:17:07


On Wed, Feb 15, 2006 at 08:42:59AM -0500, Jeff Squyres wrote:
> gblan1 or gbnode1? Your text lists both names.

  gbnode1. Sorry for the confusion.
>
> > But the test with NetPipe show that messages are still transfered
> > on the
> > slow network.
> >
> > How can I debug this problem ? strace doesn't show any read of
> > lam-hostmap.txt.
>
> I'm guessing that you were stracing mpirun and did not see it read --
> is that correct?

  I've tried to strace lamd, lamboot and mpirun. The problem is I don't
know how to strace the MPI processes.

> If so, keep in mind that mpirun doesn't read lam-
> hostmap.txt (because it does no MPI-level communications) -- lam-
> hostmap.txt is read by the MPI processes.
>
> Additionally, this means that the MPI processes must be able to see/
> read the lam-hostmap.txt file. Can you verify two things:
>
> 1. That the names that appear in the file are correct and resolvable
> on the nodes where MPI processes run
> 2. That the file itself is readable on the nodes where MPI processes run

rsh gbnode27
cat /usr/local/lam-7.1.2b31/etc/lam-hostmap.txt
# Copyright (c) 2001-2003 The Trustees of Indiana University.
# All rights reserved.
[...]
hostname
node27.alineos.net

cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
192.168.1.150 master0.alineos.net master0
192.168.1.100 node0.alineos.net node0
192.168.1.101 node1.alineos.net node1
192.168.1.102 node2.alineos.net node2
[...]
192.168.2.150 gbmaster0
192.168.2.100 gbnode0
192.168.2.101 gbnode1

 NB: The command host doesn't work because there isn't any DNS server
reachable from the hosts.

  Fabrice BOYRIE