On Nov 15, 2005, at 5:55 PM, Joshua Mora wrote:
> I am a newbie on LAM over IB and I couldn't see among the 5 parameters,
> which one affects the memory that gets pinned.
> It talks about memory managers (I use the default ptmalloc2),
> envelopes and
> tiny message criterion of 1024,hca and port.
> I assume that hca and port are fine since it works with 4 processes
> within a
> node but when I want to use more than 2 processes per node it fails.
> What should I try if it even fails a 'hello world' app. Can you tell
> me a
> simple example about what parameter to try ?
You're correct that the _id and _port params should not be necessary
(we really only included them for correctness, in anticipation of "odd"
cases where LAM couldn't derive the desired hca/port ID's by itself).
Additionally, the _priority parameter shouldn't matter, either.
It's the _num_envelopes and _tinymsglen parameters that are the issue.
By default, LAM pins 64 messages, each of size 1024 bytes for each peer
in the job. So in your case, each of your 4 processes is going to
automatically pin [nominally] 64k of memory for each peer. But it's
actually more than that for a few reasons:
- each tiny message is accompanied by a header (so it's really like
1050 bytes or so -- I don't remember the exact value offhand).
- each buffer is individually malloc'ed
- memory is actually pinned by page, not by byte range
Memory is also pinned for large messages, but that shouldn't affect
startup/MPI_INIT problems. My point here is to try decreasing the
number of envelopes to, say, 32 or 16 or 8, and see what happens. As
described above, LAM's pre-pinning of resources is quite greedy and
increases linearly with the number of processes in your job. Open MPI
does a much better job with this, actually (it uses an IB shared
resource queue, if available, and therefore the pinned resource usage
is independent of the number of processes in the job).
So try this:
mpirun -ssi rpi_ib_num_envelopes 32 -np 4 your_app_name
and see what happens.
> Is there a way of understanding the reason of this pinning memory
> problem
> and how that relates to the ssi rpi ib parameters ?
If the system runs out of pinnable memory, your processes will abort.
> Do you think is a limitation of my system? How can I verify that.
> Is there an easy way to trace where the problem is?
Can you post the exact warning/error messages that are preventing you
from running? I'm only guessing that you're having memory problems
based on what you have described; seeing the exact output would be
helpful.
> Thank you for your patience.
>
> -----Original Message-----
> From: lam-bounces_at_[hidden] [mailto:lam-bounces_at_[hidden]] On
> Behalf Of
> Jeff Squyres
> Sent: Tuesday, November 15, 2005 1:41 PM
> To: General LAM/MPI mailing list
> Subject: Re: LAM: help with LAM over IB
>
> Look at section 9.3.4 of the LAM/MPI User's Guide -- it talks
> specifically about the ib RPI and the SSI parameters that are
> available. There's a bunch of descriptions in there about the pros and
> conns of increasing/decreasing the pinned memory values, etc.
>
>
> On Nov 15, 2005, at 12:28 PM, Joshua Mora wrote:
>
>> Thank you Jeff for your replay.
>> I read the documentation but I couldn't find any parameters that
>> modify the
>> pinned memory at startup for ssi rpi ib module.
>> Thanks.
>>
>> -----Original Message-----
>> From: lam-bounces_at_[hidden] [mailto:lam-bounces_at_[hidden]] On
>> Behalf Of
>> Jeff Squyres
>> Sent: Tuesday, November 15, 2005 9:38 AM
>> To: General LAM/MPI mailing list
>> Subject: Re: LAM: help with LAM over IB
>>
>> On Nov 14, 2005, at 10:11 AM, Joshua Mora wrote:
>>
>>> I am trying to use LAM 7.1.1 and latest betas to spawn more than 2
>>> processes per node (4 processors per node) using module ssi rpi ib.
>>> Any other modules are working fine.
>>
>> Are you saying that you can't launch more than 2 processes on a node
>> using IB?
>>
>> If so, it could well be because of pinned memory limitations. You can
>> use run-time LAM SSI parameters to decrease the amount of memory that
>> is pinned at startup -- see the LAM/MPI User's Guide
>> (http://www.lam-mpi.org/using/docs/) for specific details about the
>> Infiniband RPI SSI parameters.
>>
>> --
>> {+} Jeff Squyres
>> {+} The Open MPI Project
>> {+} http://www.open-mpi.org/
>>
>> _______________________________________________
>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
>> _______________________________________________
>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
>
> --
> {+} Jeff Squyres
> {+} The Open MPI Project
> {+} http://www.open-mpi.org/
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/
|