LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Atle Svandal (svandal_at_[hidden])
Date: 2004-12-03 07:23:42


Machine:
 
2x Athlon MP2400 machine running red hat 9.0 connected in a cluster with 4
similar machines.
 
Problem:
 
Starting up lamboot on a single machine and running mpirun is ok on one
processor, but stalls on 2.
 
            mpirun -np 1 <program> running fine
 
            mpirun -np 2 <program> stalls at first or second
MPI_Send entry
 
The strange thing is that booting two machines with a hostfile like:
 
aqnode03
aqnode04
 
Now running on 2 cpu's is going fine (one on each machine). Running on 4 or
1 cpu's is also ok, but now the program if I try to run it on 3 cpu's.
 
The hostfile should normally be specified as:
 
aqnode03 cpu=2
aqnode04 cpu=2
 
Since each node has two cpu's. Booting lam with this option results in a lot
of stalls. Only way one can run the program is on 1 cpu. The hostfile
without cpu specification works well, running mpirun -np 4 will run the
program efficiently on all 4 cpu's.
 
The problem is hardly program specific, since we are running the same
program on two other machines (Opteron running Fedora Core 2). On this
machines also the cpu options in the hostfile is working well.
 
Hopefully there is someone out there to answer my most confusing questions.
 
regards
 
Atle Svandal
 
Institutt for Fysikk og Teknologi
Universitetet i Bergen
Allegaten 55 - 5007 Bergen
tlf: 55 58 32 58