On Oct 16, 2004, at 6:39 PM, <daniele.speziani_at_[hidden]> wrote:
> I'm not an LAM/MPI developer and I'm trying to let PamCrash ( Crash
> simulation ) work. The machine is one 4-way Opteron, SuSE linux 9.0,
> and LAM/MPI ver 6.5.9 installed.
>
> I set this environment variables:
>
> PAMHOME=/usr/local/Pamcrash2002R2
> PCHOME=/usr/local/Pamcrash2002R2/pamcrash_safe
> LAMHOME=/usr/local/bin
The value of LAMHOME doesn't seem right... I'm guessing it should
either be /usr/local or you should just leave it unset. There are few
conditions where it is actually necessary to set LAMHOME.
> And then:
>
> $PAMHOME/Dmprun.csh -pg $PAMHOME/pamcrash_safe/Linux/v2002DMP_P4.x
> CMC_side_b_2002.pc -np 2 -cf
> bhost -wd /usr/CALCOLO
>
> Where the command file Dmprun.csh is:
> $LAMHOME/lamboot -v bhost
> $LAMHOME/mpirun -np 2 $PAMHOME/pamcrash_safe/Linux/v2002DMP_P4.x
> CMC_side_b_2002.pc ( this file is the FEM model that is to be
> analyzed
>
> the message that I receive is:
>
>
> LAM 6.5.9/MPI 2 C++/ROMIO - Indiana University
>
> Executing hboot on n0 (Cluster - 2 CPUs)...
> topology done
>
> Then it hangs.
> I reviewed the FAQ and in category 5, point 24, I think that I found
> my problem but:
>
> 1) I have only a single 4-way machine so there isn't any proble of
> different version of LAM across the nodes.
> 2) My application is compiled/linked with version 6.5.6 but I know
> that the application and LAM works smoothly on another cluster. ( also
> I cannot find version 6.5.6 around the internet )
Note that LAM is *NOT* guaranteed to be binary compatible between
different versions. You may well be having a problem because the 6.5.9
mpirun is expecting something different than what your 6.5.6
application is sending.
Honestly, the 6.5.x series was so long ago that I really don't remember
if we changed the startup protocols between those versions. I'm glad
that you can't find 6.5.6 on the net anywhere ;-) because it's so old
and is unsupported. We generally don't make old versions available
unless someone really, really needs it. But even then, only with dire
warnings. :-)
Can you recompile your app with 6.5.9, or, better yet, upgrade to 7.0.x
or 7.1.x? The 7 series are much more modern, up-to-date, and are the
only supported versions of LAM.
> 3) I've used the full path for mpirun and my application
>
> I've also tryed to do a lamboot :
>> lamboot -v bhost
>
> LAM 6.5.9/MPI 2 C++/ROMIO - Indiana University
>
> Executing hboot on n0 (Cluster - 1 CPU)...
> topology done
>>
> so I think lamboot is OK with bhost:
> Cluster 2
> and after lamboot tping works
>
> I don't think that the problem is into the application because it
> works smoothly on a cluster with 64Cpus ( 32 bi-processors ).
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|