Hello,
I'm not an LAM/MPI developer and I'm trying to let PamCrash ( Crash simulation ) work.
The machine is one 4-way Opteron, SuSE linux 9.0, and LAM/MPI ver 6.5.9 installed.
I set this environment variables:
PAMHOME=/usr/local/Pamcrash2002R2
PCHOME=/usr/local/Pamcrash2002R2/pamcrash_safe
LAMHOME=/usr/local/bin
And then:
$PAMHOME/Dmprun.csh -pg $PAMHOME/pamcrash_safe/Linux/v2002DMP_P4.x CMC_side_b_2002.pc -np 2 -cf
bhost -wd /usr/CALCOLO
Where the command file Dmprun.csh is:
$LAMHOME/lamboot -v bhost
$LAMHOME/mpirun -np 2 $PAMHOME/pamcrash_safe/Linux/v2002DMP_P4.x CMC_side_b_2002.pc ( this file is the FEM model that is to be analyzed
the message that I receive is:
LAM 6.5.9/MPI 2 C++/ROMIO - Indiana University
Executing hboot on n0 (Cluster - 2 CPUs)...
topology done
Then it hangs.
I reviewed the FAQ and in category 5, point 24, I think that I found my problem but:
1) I have only a single 4-way machine so there isn't any proble of different version of LAM across the nodes.
2) My application is compiled/linked with version 6.5.6 but I know that the application and LAM works smoothly on another cluster. ( also I cannot find version 6.5.6 around the internet )
3) I've used the full path for mpirun and my application
I've also tryed to do a lamboot :
> lamboot -v bhost
LAM 6.5.9/MPI 2 C++/ROMIO - Indiana University
Executing hboot on n0 (Cluster - 1 CPU)...
topology done
>
so I think lamboot is OK with bhost:
Cluster 2
and after lamboot tping works
I don't think that the problem is into the application because it works smoothly on a cluster with 64Cpus ( 32 bi-processors ).
Can anyone help?
Thanks
|