James,
It is very peculiar that it fails in the same way with LAM/MPI, MPICH
and MPICH2, but not with SCore. :/ I must admit first of all that I
don't know a whole lot about the SCore implementation of the MPI
standard. So I can't speak to it's capabilities/features of SCore in
comparison with LAM/MPI.
Have you tried using the latest beta release of LAM/MPI?
http://www.lam-mpi.org/beta/
There have been many bug fixes since the 7.1.1 release, and *may*
incorporate a fix for this problem (but without more information I
can't say for sure).
To help diagnose this problem, I have a couple questions:
- Are you running a scheduler on the cluster, or are you relying on the
rsh/ssh boot module?
- Does LAM/MPI hang when trying to run a simple 'Hello World' style MPI
program?
- What type of program are you attempting to run when it hangs?
- Are you using a SuSe packaged version of LAM/MPI or building it from
source?
- Have you confirmed that the environments on all the nodes is setup
properly to use LAM/MPI instead of any of the other MPI implementations
that you have installed?
Cheers,
Josh
On Sep 19, 2005, at 4:32 AM, James Coomer wrote:
> Hi,
>
> I have been trying to get lam7.1.1 (and mpich for that matter) to run
> on a
> 16-node SuSe9.3 cluster of dual processor 64-bit xeon (nocona) nodes
> over
> gigabit.
>
> When I request more than one process per node, my job hangs fairly soon
> and I have to Control-C it and clean up. Running one-process-per-node
> jobs
> run OK
>
> I have tried gnu/intel compilers, 32 and 64 bit mode. I have also tried
> mpich and mpich2 which show the same problem. I have tried different
> executables (including PMB) compiled under my different mpi's.
>
> When I use SCore however (SCore is a less well known MPI) it all works
> fine.
>
> Many thanks for any help...
>
> James
>
>
> --
>
>
> Dr James Coomer
> HPC and Grid Solutions
> Streamline Computing
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
----
Josh Hursey
jjhursey_at_[hidden]
http://www.lam-mpi.org/
|