On Thu, 12 Feb 2004, Sergei Lisenkov wrote:
> During runniung of my code, I got this message:
> --------------------------------------------------------
> It seems that rank 0 was not able to allocate additional
> DMA-accessible memory for Myrinet. DMA-accessible memory is memory
> which the Myrinet cards can access. Typically, OS's have fixed limits
> on how the total amount of DMA memory can be allocated at one
> time. Long MPI messages require that a large amount of DMA-accessible
> memory be allocated. If possible, try using smaller messages or
> adjusting the OS's DMA limit. :-(
>
> GM failed to allocate a DMA block of 1052 bytes.
> -----------------------------------------------------------------------------
What is happening here is exactly what is described in the message -- the
gm library failed either to allocate more "registered" (or "pinned")
memory or to register some already-allocated memory. This is really a
limitation of the OS, not really so much in gm/Myrinet or LAM.
What OS are you using? I know that Solaris defaults to a fairly small
amount of memory that can be registered -- it is generally a good idea to
increase the limit that it can register (look in /etc/system, if I recall
correctly...).
What happens is that LAM registers pages containing user buffers that are
passed down through MPI_Send/MPI_Recv (etc.). When the communication
completes, if there are no other pending communications on that page, it
becomes eligible to be un-registered.
Other solutions include:
- trying to lessen the number of pending sends and receives.
- trying to send shorter messages that complete quickly
- ensure that you don't have any memory leaks or otherwise are orphaning
memory that may have been registered by LAM/gm.
Hope this helps.
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|