LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: McCalla, Mac (macmccalla_at_[hidden])
Date: 2003-07-25 15:48:53


>From: Jeff Squyres
>Sent: Friday, July 25, 2003 6:34 AM
>Subject: Re: FW: LAM: error msgs ---Pbind , YPBindProc_domain, and
mpirun: cannot start

>>> >The invalid address tag is an odd one -- it's actually a lamd
error indicating that
>>> >there was some kind of problem in the LAM session directory.

Could you give me a little more info on what "some kind" of problem
might be? Also,
any kind of guidance on my putting some additional debugging code into
the various flatd.c
routines where this is detected would be appreciated.
 

>> >Is the user's job script running mpirun a large number of times?
>>
>> Yes, hundreds. Is there some inherent limit on number of mpirun
>> executions per lamboot?

>There *shouldn't* be, a huge number of runs in a single universe is not
>something that we have tested extensively. When you say "hundreds", do
>you know if it's always the same number of runs that causes the
problems?
>(e.g., the internal tag number is overflowing and the lamd is not
handling
>it properly) Do you know about how many runs it takes before you run
into
>this problem?

The number of runs varies from 2 to over a hundred. I don't think this
is
a "hitting a limit" kind of problem really. (This statement of course
guarantees
that it will turn out to be exactly that..8^) )

thanks for your time,

mac mccalla