LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2010-10-06 14:53:44


U should be able to lamboot w no args for a localhost-only solution. U should nit need to modify the lamd conf file - it should be installed via "make install".

As for not passing messages, did you recompile your app with the new lam installation?

Sent from my PDA. No type good.

On Oct 6, 2010, at 11:02 AM, "David Shochat" <david.shochat_at_[hidden]> wrote:

> Tried using 7.1.4. Note that in our application, MPI is used only on a
> single node. The way we run using 7.0 is simply to start lamd (no
> arguments) before invoking mpi_run (with an app schema). This won't
> work with 7.1.4. The only way I could figure out how to do it was to
> use lamboot, which seems odd since that seems interested in starting
> things up on multiple nodes. So I just do:
> hostname > bhost
> and then run lamboot. This produced an error about not finding
> lam-conf.lamd. I thought it would be enough to set LAMHOME, since the
> default location is $LAMHOME/etc, but still got the same error. So I
> used an explicit -c $LAMHOME/etc/lam-conf.lamd and that made the error
> go away. Is there a more straightforward way to handle our (single
> node) situation, more akin to the way that works in 7.0?
> Anyway, mpirun now starts up all of the processes listed in the app
> schema. However, they don't seem to be talking to each other. No
> obvious error status, just no messages are actually being passed. It
> seems like I must be missing something fundamental here. Note that I
> have not modified lam-conf.lamd. Do I need to do something with that?
> The docs seem to say to leave it to the sysadmin. Not sure where to go
> with that. I've tried -ssi rpi tcp and -ssi rpi lamd in the mpirun
> command, but the results are the same.
> -- David
>
> On Fri, Oct 1, 2010 at 10:44 AM, Jeff Squyres <jsquyres_at_[hidden]> wrote:
>> On Sep 30, 2010, at 9:15 PM, David Shochat wrote:
>>
>>> Thanks for the quick reply. Would there be any API changes going to
>>> 7.1.4? Or going to Open MPI (assuming we're only using things that
>>> were available in 7.0)?
>>
>> There should not be. Both Open MPI and LAM/MPI implement the standard API, so any MPI functions that you're using in LAM/MPI 7.0 should also be present / unaltered (in terms of C function signature) in LAM/MPI 7.1.x and Open MPI.
>>
>>> Meanwhile, we have learned (by using truss on the sending process)
>>> that the failure is on the sending side (we can see a TCP failure
>>> followed by an unsuccessful retry) even though MPI_Bsend() is not
>>> returning an error status.
>>
>> Weird.
>>
>> Does dmesg return anything useful?
>>
>> I only half-care about the solution to that question -- if you can upgrade to newer LAM or Open MPI, it's only worthwhile to pursue that question if the same problem occurs.
>>
>> --
>> Jeff Squyres
>> jsquyres_at_[hidden]
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>>
>> _______________________________________________
>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/