LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Roger Mabe (mabe_at_[hidden])
Date: 2003-11-24 08:32:32


        Our LAM consists of 11 nodes, all except 2 running Red Hat Linux.
Of the other two, one is an Alpha running Tru64 Unix, and one is a Mac
running Debian PPC Linux. This is the only MPI application I have noticed
the problem on. However, we don't have any other MPI applications that
produce output on the same scale. The problem is odd in that it does not
manifest itself at the same point in the program. I have a script that
calls the application several times over different data sets. Some times
the problem occurs on the first call of the application, other times the
problem won't occur until the fourth or fifth call of the application.
        I have tried dumping the stdout to a file, the terminal, and to
/dev/null and the result is the same, at some point the output slows to a
crawl. I located the output to a single node, so I shut the LAM down,
removed this node, restarted the LAM. I started the application and the
slow output occurred again, this time from a different node.
        Any help you can give would be appreciated.
Thanks
Roger M. Mabe
Naval Research Laboratory
PH: 202-404-5481
Email: mabe_at_[hidden]

-----Original Message-----
From: Brian Barrett [mailto:brbarret_at_[hidden]]
Sent: Friday, November 21, 2003 6:33 PM
To: General LAM/MPI mailing list
Subject: Re: LAM: Program Slows with lam 7.0.3

On Nov 21, 2003, at 10:57 AM, Roger Mabe wrote:

> I have an MPI program which runs with little to no problems under
lam
> 6.5.9. After upgrading to 7.0.3 the program now completes (all the
> nodes stop executing) but does not exit. The program generates a
> large amount of stdio output (status reports, etc, up to about a
> 12-13Mb file for each run if redirected). If I redirect the stdio to a
> file, the file continues to be updated albeit at a snail's pace. The
> same occurs if I dump
> to a terminal, the program generates the data requested but doesn't
> complete
> because (it appears) the stdio dump is not complete but is dumping to
> the
> screen at a very slow rate.
> I am lambooting my nodes with the default settings and I'm running
> my program with mpirun -ssi rpi tcp. Has anyone experienced this or
> know
> why this is occurring. Thanks.

It is possible that you have found a bug in the stdio forwarding code
in LAM. I can't think of any particular change that would drastically
slow down stdio forwarding, but there was one small change in lamd
communication that could cause the problem. What platform are you
running on? Also, does this happen for any application you run that
produces output, or just this one application?

Brian

-- 
   Brian Barrett
   LAM/MPI developer and all around nice guy
   Have a LAM/MPI day: http://www.lam-mpi.org/