I have reinstalled 7.0.3 on two of the nodes and have tried to get it to work. I am able to lamboot the two nodes and when I try to run mpp-dyna (for 7.0.3) I get the following error message.
becker@~/mpp-lam703 $ mpirun -np 4 mpp970 i=~/test/Main.k memory=200000000
-----------------------------------------------------------------------------
It seems that there is no lamd running on the host edms-head1.
This indicates that the LAM/MPI runtime environment is not operating.
The LAM/MPI runtime environment is necessary for MPI programs to run
(the MPI program tired to invoke the "MPI_Init" function).
Please run the "lamboot" command the start the LAM/MPI runtime
environment. See the LAM/MPI documentation for how to invoke
"lamboot" across multiple machines.
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
It seems that there is no lamd running on the host edms-head1.
This indicates that the LAM/MPI runtime environment is not operating.
The LAM/MPI runtime environment is necessary for MPI programs to run
(the MPI program tired to invoke the "MPI_Init" function).
Please run the "lamboot" command the start the LAM/MPI runtime
environment. See the LAM/MPI documentation for how to invoke
"lamboot" across multiple machines.
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
It seems that there is no lamd running on the host edms-dyna2.
This indicates that the LAM/MPI runtime environment is not operating.
The LAM/MPI runtime environment is necessary for MPI programs to run
(the MPI program tired to invoke the "MPI_Init" function).
Please run the "lamboot" command the start the LAM/MPI runtime
environment. See the LAM/MPI documentation for how to invoke
"lamboot" across multiple machines.
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
It seems that [at least] one of the processes that was started with
mpirun did not invoke MPI_INIT before quitting (it is possible that
more than one process did not invoke MPI_INIT -- mpirun was only
notified of the first one, which was on node n0).
mpirun can *only* be used with MPI programs (i.e., programs that
invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program
to run non-MPI programs over the lambooted nodes.
-----------------------------------------------------------------------------
I have switched my $PATH variable over to make sure everything gets pointed to /usr/local/lam-7.0.3/ rather than the 6.5.9 directory. This has been verified by calling "which mpirun" and so on. I have verified that lamd is running on all servers by calling "ps -aux." I have also tried running "lamnodes" and it reports:
becker@~/mpp-lam703 $ lamnodes
n0 edms-head1:2:origin,this_node
n1 edms-dyna2:2:
Which is correct. I have also tried running something like "lamexec -np 4 hostname" and it reports correctly also:
becker@~/mpp-lam703 $ lamexec -np 4 hostname
edms-head1
edms-head1
edms-dyna2
edms-dyna2.
Other than doing the following on all servers:
configure --prefix=/usr/local/lam-7.0.3 --with-rsh="ssh -x"
make
make install
Then performing the following on the head node server to boot LAM.
lamboot -v hostfile
Am I missing something? It seems like this should work, and it's not that difficult to setup, but for some reason it just doesn't like me.
-Rob
-----Original Message-----
From: Jeff Squyres [mailto:jsquyres_at_[hidden]]
Sent: Thursday, December 18, 2003 8:54 AM
To: General LAM/MPI mailing list
Subject: Re: LAM: p4_error: Could not gethostbyname
On Thu, 18 Dec 2003, Becker, Robert P wrote:
> Wow. I've been having one heck of a time getting LAM/MPI to work with
> LS-DYNA. It's been one problem after another. I just finished
> reloading the OS (Redhat 9.0) on all 3 systems and I have installed a
> fresh copy of LAM/MPI 6.5.9 on the servers. I was unable to get this
> far with 7.0.3 so, I am now using 6.5.9.
What problems did you have with 7.0.3? If there's problems with it, I'd
like to know so that we can get them fixed. As far as we know, the 7.x
series works great under RedHat 9. I'd strongly recommend using the 7.x
series over the 6.5 series (the 6.5 series is no longer supported and has
known problems).
One thing to be aware of is that RH 9 may install its own copy of LAM --
you might want to ensure that that version is removed before you start
installing your own versions. For example, you should probably check to
see if RH 9's LAM RPM is installed:
shell# rpm -qa | grep lam
If it is, and you want to have your own version of LAM installed (e.g.,
the 7.x series), you should probably remove it.
> The current problem is when I execute the mpirun command for mpp970
> (LS-DYNA's mpp version). I excecute the command "mpirun -np 4 mpp970
> i=Main.k memory=200000000" This starts up mpp970 on two of the systems
> and it loads two processes per system. I am able to view this with top.
> After exactly 60 seconds of mpp970 running I get the following error
> message.
>
> becker@~/test $ mpirun -np 4 mpp970 i=Main.k memory=200000000
> p0_26787: p4_error: Could not gethostbyname for host edms-dyna1; may be invalid name
> : 61
> -----------------------------------------------------------------------------
> It seems that [at least] one of the processes that was started with
> [snipped]
>From these error messages, it looks very much like you are mixing MPI
implementations.
The p4 error is from MPICH. The "It seems that..." error is from LAM.
LAM and MPICH, while both are fine MPI implementations, do not
interoperate. While both can be installed on a set of machines
simultaneously with no problems, you must be sure that you compile, link,
and run a given MPI application with entirely one implementation.
>From the output above, it looks like mpp970 was compiled with MPICH but
then you ran with LAM's mpirun.
If you want to use LAM/MPI, you'll need to recompile mpp970 with LAM's
mpicc/mpiCC/mpif77 wrapper compilers, and then use LAM's mpirun, etc.
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/
|