LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: jess michelsen (jam_at_[hidden])
Date: 2003-10-20 10:10:51


Hi Sebastian!

I've just had a similar error message, except the (Fortran) code failed
at an MPI_WAITALL. Since this was the first application I tested under
LAM, and it had been running under other MPI installations, I initially
thought that my LAM installation had not been done correctly. However,
soon it turned out that a part of the code was running in parallel for
the first time. In this part, an MPI_ISEND was missing (due to a flawed
IF-THEN-ELSE construct)!! Did you check that the message that proc 0 is
waiting for has ever neen send?

Best regards, Jess Michelsen

On Thu, 2003-10-16 at 17:59, Brian W. Barrett wrote:
> It's pretty hard to pin down the problem based on the information you
> provided. The LAM team no longer has access to HPUX machines, so it is
> possible that something changed that causes problems on HPUX. On the
> other hand, it is not unusual for codes developed on one platform to
> have odd issues on another platform (even MPI apps :) ).
>
> Have you run the LAM/MPI test suite (available for download on the same
> page as the 7.0.2 release)? It does a reasonably good job of poking
> around in the MPI implementation. If it passes, then I would start
> looking at the user code. The complaint is that one of the two
> processes died either in the barrier or before the barrier started.
> Without seeing the code, I could not make any guesses as to the cause
> of the death. A debugger may be helpful here.
>
> Hope that helps,
>
> Brian
>
>
> On Thursday, October 16, 2003, at 05:38 AM, Sebastian Henkel wrote:
>
> > I have installed LAM/MPI 7.0.2 on an HPUX 10.20 workstation. The
> > compilation
> > went successful and I could run the examples and several other test
> > cases
> > without problems on several nodes. As written on the webpage at "lam
> > dash
> > mpi dot org" I have attached to this mail the config.log file and the
> > laminfo output.
> >
> > The problem I am having is with a CFD program developed with LAM/MPI on
> > RH-Linux. According to the developers it is running without problems
> > there.
> > When started on two nodes on the same processor it is working without
> > any
> > problems. If I start the case on two nodes each on a different
> > workstation
> > the program doesn't get past MPI_BARRIER. I suppose it could be a
> > problem of
> > HPUX 10.20, but I don't know.
> >
> > I need help, as I am not familiar with MPI in anyway besides compiling
> > it.
> >
> > I run mpi with the following command (tcp is the default)
> >
> > mpirun n0 n1 program
> >
> > The error I receive is:
> >
> > MPI_Recv: process in local group is dead (rank 0, MPI_COMM_WORLD)
> > Rank (0, MPI_COMM_WORLD): Call stack within LAM:
> > Rank (0, MPI_COMM_WORLD): - MPI_Recv()
> > Rank (0, MPI_COMM_WORLD): - MPI_Barrier()
> > Rank (0, MPI_COMM_WORLD): - main()
> >
> > Having added write statements to the program I know it is always
> > crashing
> > when calling MPI_BARRIER.
> >
> > The CFD program and the MPI implementation are written in Fortran 90
> > using
> > the mpif77 wrapper. When compiling LAM I made sure that mpif77 will
> > use f90
> > as Fortran compiler.
> >
> > Hopefully someone can give me a hint as to what the reason might be.
> >
> > Best regards
> >
> > Sebastian Henkel
> > --
> >
> > Dipl.-Ing. Sebastian Henkel, Naval Architect, TKB Basic Design
> >
> > Tel. : +49 461 4940-508 FLENSBURGER SCHIFFBAU-GESELLSCHAFT mbH &
> > Co. KG
> > Fax : +49 461 4940-217 Batteriestrasse 52, D-24939 Flensburg,
> > Germany
> > E-Mail: henkel at fsg-ship dot de
> >
> >
> > <config.log.gz><laminfo.out>___________________________________________
> > ____
> > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> >
> --
> Brian Barrett
> LAM/MPI developer and all around nice guy
> Have a LAM/MPI day: http://www.lam-mpi.org/
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/