Just a followup for the list archives. Jeremy is currently thinking that
this is a hardware / operating system problem. Something painful sounding
like the NIC processors overheating :(.
Brian
On Thu, 5 Sep 2002, jeremy archuleta wrote:
> i believe that lam6.6b1 has problems in running a lot
> (as in 100's) of successive jobs. at least, running my
> code ; )
>
> it appears that lam6.6b1 is not releasing memory
> correctly, because when lam refuses to run anymore,
> memory usage, given by "top", is around 95% and my
> code only uses 16MB spread out over 4 nodes for each
> run with 2 "malloc's" and 2 "free's".
>
> but i have also found that lam6.5.6 can run without
> problem for 1140+ runs (i killed it).
>
> one last thing, when it stops i can then "lamhalt",
> but i can't "lamboot" because lamboot can't boot the
> origin node, but it can boot the remote nodes.
>
> i am curious to know if anyone else has had these
> types of problems. if you haven't, but would like to
> test my code, i can send it to you...it's a small heat
> equation code.
>
> that's it.
> oh. and this version passed all the lamtests when i
> installed it.
> -j
>
> __________________________________________________
> Do You Yahoo!?
> Yahoo! Finance - Get real-time stock quotes
> http://finance.yahoo.com
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
--
Brian Barrett
LAM/MPI developer and all around nice guy
Have a LAM/MPI day: http://www.lam-mpi.org/
_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/
|