LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2003-12-18 08:53:48


On Thu, 18 Dec 2003, Becker, Robert P wrote:

> Wow. I've been having one heck of a time getting LAM/MPI to work with
> LS-DYNA. It's been one problem after another. I just finished
> reloading the OS (Redhat 9.0) on all 3 systems and I have installed a
> fresh copy of LAM/MPI 6.5.9 on the servers. I was unable to get this
> far with 7.0.3 so, I am now using 6.5.9.

What problems did you have with 7.0.3? If there's problems with it, I'd
like to know so that we can get them fixed. As far as we know, the 7.x
series works great under RedHat 9. I'd strongly recommend using the 7.x
series over the 6.5 series (the 6.5 series is no longer supported and has
known problems).

One thing to be aware of is that RH 9 may install its own copy of LAM --
you might want to ensure that that version is removed before you start
installing your own versions. For example, you should probably check to
see if RH 9's LAM RPM is installed:

shell# rpm -qa | grep lam

If it is, and you want to have your own version of LAM installed (e.g.,
the 7.x series), you should probably remove it.

> The current problem is when I execute the mpirun command for mpp970
> (LS-DYNA's mpp version). I excecute the command "mpirun -np 4 mpp970
> i=Main.k memory=200000000" This starts up mpp970 on two of the systems
> and it loads two processes per system. I am able to view this with top.
> After exactly 60 seconds of mpp970 running I get the following error
> message.
>
> becker@~/test $ mpirun -np 4 mpp970 i=Main.k memory=200000000
> p0_26787: p4_error: Could not gethostbyname for host edms-dyna1; may be invalid name
> : 61
> -----------------------------------------------------------------------------
> It seems that [at least] one of the processes that was started with
> [snipped]

>From these error messages, it looks very much like you are mixing MPI
implementations.

The p4 error is from MPICH. The "It seems that..." error is from LAM.
LAM and MPICH, while both are fine MPI implementations, do not
interoperate. While both can be installed on a set of machines
simultaneously with no problems, you must be sure that you compile, link,
and run a given MPI application with entirely one implementation.

>From the output above, it looks like mpp970 was compiled with MPICH but
then you ran with LAM's mpirun.

If you want to use LAM/MPI, you'll need to recompile mpp970 with LAM's
mpicc/mpiCC/mpif77 wrapper compilers, and then use LAM's mpirun, etc.

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/