-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Thursday, July 24, 2003, at 05:26 PM, Pak, Anne O wrote:
> related to an earlier posting, i've noticed that on the slave nodes
> (which have been dying prematurely), there seems to be multiple
> instance of lines similar to the one below..however, there is only one
> on the master node. is th ere only supposed to be one per mpirun
> execution?
>
> /usr/bin/lamd -H 10.1.1.1 -P 438
>
> can someone please explain what each of the numbers mean? i.e. -H
> 10.1.1.1 and -P 438...i don't see any process ID's 438's.
That doesn't seem quite right (especially if all the processes are
owned by you). There should only ever be one instance of the lamd
running on a particular node. From time to time, you may be able to
see multiple instances of the lamd running, as the lamd does fork/exec
other processes (so there is a short period of time where the daemon
has forked but not yet execed), but this shouldn't be a very common
thing.
Are all the other processes showing up in ps as you would expect (the
slave processes for your app, that is)? If you are using LAM 6.5.x,
this is going to be a bit difficult, but could you start LAM with
debugging enabled (lamboot -v) and send me (off the list would be best)
the output of the LAM daemons? If you are using LAM 6.5, you can
probably get away with 'grep LAM /var/log/syslog > lamd.log'. In LAM
7.0, the data should be in a text file with the name debug.txt in the
session directory (the directory LAM creates in /tmp) - just grab that
file after the app dies but before running lamhalt or wipe.
As for the options to the lamd, the two you see above are information
on how the lamd can reach the lamboot process at startup to find it's
neighbor information. The -H is the IP and the -P is the port number.
after the initial routing table is filled in, this information is no
longer used.
Hope this helps,
Brian
- --
Brian Barrett
LAM/MPI developer and all around nice guy
Have a LAM/MPI day: http://www.lam-mpi.org/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (Darwin)
iD8DBQE/IV+u3TvSMqaebW4RAoa1AKDdXW11ZMwqq1xP1wxRSHDOKe49OACeOAxf
uIrGxbFFN3bFuxp46k2gayA=
=BrlW
-----END PGP SIGNATURE-----
|