LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2006-02-15 08:42:59


On Feb 14, 2006, at 4:47 PM, BOYRIE Fabrice wrote:

> We have two networks on our cluster. The first one (100Mbit/s) gives
> hostnames as node1, node2... and the second one (1gbit/s) as gblan1,
> gblan2...
> We use torque on our cluster on the first network. So when we
> integrate lam, messages are transfered on this network.
>
> The documentation suggests to use lam-hostmap.txt.
> lamd was configured with
> ./configure --with-trillium \
> --prefix=/usr/local/lam-7.1.2b31 \
> --with-tm=/usr/local/torque2.0.0p7
>
> So I've added in the file /usr/local/lam-7.1.2b31/etc/lam-hostmap.txt
> node1 mpi=gbnode1
>
> (and node1.alineos.net mpi=gbnode1 to be sure)

gblan1 or gbnode1? Your text lists both names.

> But the test with NetPipe show that messages are still transfered
> on the
> slow network.
>
> How can I debug this problem ? strace doesn't show any read of
> lam-hostmap.txt.

I'm guessing that you were stracing mpirun and did not see it read --
is that correct? If so, keep in mind that mpirun doesn't read lam-
hostmap.txt (because it does no MPI-level communications) -- lam-
hostmap.txt is read by the MPI processes.

Additionally, this means that the MPI processes must be able to see/
read the lam-hostmap.txt file. Can you verify two things:

1. That the names that appear in the file are correct and resolvable
on the nodes where MPI processes run
2. That the file itself is readable on the nodes where MPI processes run

-- 
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/