LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Prabhanjan Kambadur (pkambadu_at_[hidden])
Date: 2004-06-23 18:15:23


Hi,

Could you find out whether you are getting this error message before or
after MPI_INIT. Other than for dynamic processes (MPI-II), LAM does not
add on file descriptors after MPI_INIT. In LAM, we open a fully connected
set of TCP connections if you are using the tcp rpi. The number of fd's
that LAM opens up depends on the number of processes you are running in
your job. LAM adds on only a small number of fd's other than this (for
communication with lamd's, stdin, stdout, etc). So, if you start up 1000
processes per node, then per process fd limit should be a bit more than
1000 and the system wide limit should be 1000 times the previous number.

Another alternative might be to use sysv or usysv rpi's. This would reduce
the fd count for you. Another thing you might want to do is find out
whether the application is leaking file descriptors. There are many tools
such as lsof.

Hope this helps,
Anju

On Wed, 23 Jun 2004, Andy Young wrote:

> Hi LAM Developers and users,
> I am running LAM MPI for use with the
> "Global_Arrays" programs written at the Pacific North
> National Lab.
> http://www.emsl.pnl.gov/docs/global/ga.html
> It has been working great across a cluster of several
> SMP's, but I've been having a problem with jobs on 1
> SMP. I get an error that says "Too Many Open Files."
> I talked to an AIX developer, and he points out that
> if you set your file descriptor limit to its maximum,
> the OS reports it as unlimited, but in reality it is
> limited to 10,000 per process.
> Is it possible to approach that limit while using
> LAM-MPI? Are there still any known issues with
> removing old fd's?
> Here is my laminfo output:
> LAM/MPI: 7.0.4
> Prefix: /usr/local/lam_mpi
> Architecture: powerpc-ibm-aix5.2.0.0
> Configured by: andy
> Configured on: Wed Jun 9 18:33:22 EDT 2004
> Configure host:
> C bindings: yes
> C++ bindings: no
> Fortran bindings: yes
> C profiling: no
> C++ profiling: no
> Fortran profiling: no
> ROMIO support: no
> IMPI support: no
> Debug support: no
> Purify clean: no
> SSI boot: globus (Module v0.5)
> SSI boot: rsh (Module v1.0)
> SSI coll: lam_basic (Module v7.0)
> SSI coll: smp (Module v1.0)
> SSI rpi: crtcp (Module v1.0.1)
> SSI rpi: lamd (Module v7.0)
> SSI rpi: sysv (Module v7.0)
> SSI rpi: tcp (Module v7.0)
> SSI rpi: usysv (Module v7.0)
>
> I can't think of what else I should include to help
> describe the problem. I have been unable to replicate
> the error with dbx. Also, AIX is no POSIX compliant,
> but is Unix98 compliant (I doubt this will help much.)
>
> I have been running with rsh boot and the tcp RPI.
> Many Thanks,
> Andy
>
>
>
>
> __________________________________
> Do you Yahoo!?
> Yahoo! Mail Address AutoComplete - You start. We finish.
> http://promotions.yahoo.com/new_mail
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>