LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2005-08-30 07:53:41


On Aug 28, 2005, at 12:48 PM, Lei_at_ICS wrote:

> My experiment today takes the Matlab out of the equation.
> Now the question is why I am unable to connect to the server
> from an MPI singleton that is run without using mpirun.

Hum -- that should not be. Singletons should be able to connect/accept
-- if they can't, that's a bug in LAM! :-(

And yes, I just confirmed this behavior. Let me look into a fix and
roll it up into another 7.1.2 beta -- I'll mail later about this.

> Two related questions:
> (1) When my client is run from mpirun, but the server is not
> started, if my client MPI_Lookup_name, it will crash with
> the error:
> MPI_Lookup_name: publishing service: name is not published (rank 0,
> MPI_COMM_WORLD)
> Rank (0, MPI_COMM_WORLD): Call stack within LAM:
> Rank (0, MPI_COMM_WORLD): - MPI_Lookup_name()
> Rank (0, MPI_COMM_WORLD): - main()

I [finally] replied to this -- sorry for the delay...

> (2) If I control-C the server, obviously the server will not
> have a chance to MPI_Unpublish_name. The next time
> I start the server, it will crash with the error:
> MPI_Publish_name: publishing service: name is published (rank 0,
> MPI_COMM_WORLD)
> Rank (0, MPI_COMM_WORLD): Call stack within LAM:
> Rank (0, MPI_COMM_WORLD): - MPI_Publish_name()
> Rank (0, MPI_COMM_WORLD): - main()
>
> I can lamboot again to solve this problem. But is there a way to
> remove
> the left-over published name from with my server code?

The lamclean command will kill *everything* in the universe without
re-lambooting (to include published named). There's unfortunately no
simple LAM tool for a finer-grained solution than this (i.e., just
unpublish a single name), but the suggestion was made that you could
cron an MPI job to [intelligently] unpublish the name (i.e., check some
conditions and unpublish if a) there is a LAM universe still in
existence, b) the name is still published, and c) the originator of the
name is now gone). This would likely contain some logic outside of MPI
to help determine these conditions.

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/