Hi Anju,
Thanks for replying to this post. The error is
definitely occuring AFTER MPI_INIT, and therefore
couldn't be a limit that LAM is running into. My
computations only have 4 processes open, so that it
won't be a FD limit that LAM has reached. I think
that I'll have to be reviewing the use of FD's in the
GLOBAL_ARRAYS programs.
Many Thanks,
Andy
--- Prabhanjan Kambadur <pkambadu_at_[hidden]> wrote:
>
> Hi,
>
> Could you find out whether you are getting this
> error message before or
> after MPI_INIT. Other than for dynamic processes
> (MPI-II), LAM does not
> add on file descriptors after MPI_INIT. In LAM, we
> open a fully connected
> set of TCP connections if you are using the tcp rpi.
> The number of fd's
> that LAM opens up depends on the number of processes
> you are running in
> your job. LAM adds on only a small number of fd's
> other than this (for
> communication with lamd's, stdin, stdout, etc). So,
> if you start up 1000
> processes per node, then per process fd limit should
> be a bit more than
> 1000 and the system wide limit should be 1000 times
> the previous number.
>
> Another alternative might be to use sysv or usysv
> rpi's. This would reduce
> the fd count for you. Another thing you might want
> to do is find out
> whether the application is leaking file descriptors.
> There are many tools
> such as lsof.
>
> Hope this helps,
> Anju
>
> On Wed, 23 Jun 2004, Andy Young wrote:
>
> > Hi LAM Developers and users,
> > I am running LAM MPI for use with the
> > "Global_Arrays" programs written at the Pacific
> North
> > National Lab.
> > http://www.emsl.pnl.gov/docs/global/ga.html
> > It has been working great across a cluster of
> several
> > SMP's, but I've been having a problem with jobs on
> 1
> > SMP. I get an error that says "Too Many Open
> Files."
> > I talked to an AIX developer, and he points out
> that
> > if you set your file descriptor limit to its
> maximum,
> > the OS reports it as unlimited, but in reality it
> is
> > limited to 10,000 per process.
> > Is it possible to approach that limit while
> using
> > LAM-MPI? Are there still any known issues with
> > removing old fd's?
> > Here is my laminfo output:
> > LAM/MPI: 7.0.4
> > Prefix: /usr/local/lam_mpi
> > Architecture: powerpc-ibm-aix5.2.0.0
> > Configured by: andy
> > Configured on: Wed Jun 9 18:33:22 EDT 2004
> > Configure host:
> > C bindings: yes
> > C++ bindings: no
> > Fortran bindings: yes
> > C profiling: no
> > C++ profiling: no
> > Fortran profiling: no
> > ROMIO support: no
> > IMPI support: no
> > Debug support: no
> > Purify clean: no
> > SSI boot: globus (Module v0.5)
> > SSI boot: rsh (Module v1.0)
> > SSI coll: lam_basic (Module v7.0)
> > SSI coll: smp (Module v1.0)
> > SSI rpi: crtcp (Module v1.0.1)
> > SSI rpi: lamd (Module v7.0)
> > SSI rpi: sysv (Module v7.0)
> > SSI rpi: tcp (Module v7.0)
> > SSI rpi: usysv (Module v7.0)
> >
> > I can't think of what else I should include to
> help
> > describe the problem. I have been unable to
> replicate
> > the error with dbx. Also, AIX is no POSIX
> compliant,
> > but is Unix98 compliant (I doubt this will help
> much.)
> >
> > I have been running with rsh boot and the tcp RPI.
> > Many Thanks,
> > Andy
> >
> >
> >
> >
> > __________________________________
> > Do you Yahoo!?
> > Yahoo! Mail Address AutoComplete - You start. We
> finish.
> > http://promotions.yahoo.com/new_mail
> > _______________________________________________
> > This list is archived at
> http://www.lam-mpi.org/MailArchives/lam/
> >
> _______________________________________________
> This list is archived at
> http://www.lam-mpi.org/MailArchives/lam/
>
__________________________________
Do you Yahoo!?
New and Improved Yahoo! Mail - Send 10MB messages!
http://promotions.yahoo.com/new_mail
|