LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Anthony Chan (chan_at_[hidden])
Date: 2005-12-13 16:57:34


On Mon, 12 Dec 2005, Qiang Xu wrote:

> [qiang_lam_at_compute-0-1 mympiProftest]$ mpicc -c testProf.c
> [qiang_lam_at_compute-0-1 mympiProftest]$ ar -rc ~/lam-7.1.1/lib/libmyprof.a testProf.o
> [qiang_lam_at_compute-0-1 mympiProftest]$ mpif77 -o m m.o b.o -L ~/lam-7.1.1/lib/ -llamf77mpi -lmyprof
> [qiang_lam_at_compute-0-1 mympiProftest]$ mpirun -np 2 m
>
> LAM_MPI_Fortran_program#
> breakpoint1
>
> LAM_MPI_Fortran_program#
> breakpoint1
> -----------------------------------------------------------------------------
> One of the processes started by mpirun has exited with a nonzero exit
> code. This typically indicates that the process finished in error.
> If your process did not finish in error, be sure to include a "return
> 0" or "exit(0)" in your C code before exiting the application.
>
> PID 28533 failed on node n0 (192.168.1.253) due to signal 11.
> -----------------------------------------------------------------------------
>
> But after link with the profiling lib, which works fine with MPICH,
> did not work with LAM.
>
> And the MPI_Init( ) in the profiling lib is as following
> int MPI_Init( argc, argv )
> int * argc;
> char *** argv;
> { .......
> printf("\n");
> for (i=0;i<(*argc);i++)
> {printf("%s#",(*argv)[i]);}
>
> startTick=times(NULL);
> printf("\n breakpoint1 \n");
> returnVal = PMPI_Init( argc, argv );
> printf("\n breakpoint2 \n");
> stopTick=times(NULL);
> ..........
> }
>
> PMPI_Init(argc, argv) is the problem, why? And why argv[ ] only contain one element "LAM_MPI_Fortran_program"?
>

Can you do a fflush() after printf() to make sure that the code really
aborts before exiting PMPI_Init()? It may be helpful if you could say
what version of LAM-MPI you are using.

Another option is to get mpe2-1.0.3p1 from http://www.mcs.anl.gov/perfvis
and compile against the version of LAM-MPI that you are using, i.e.
use -lmpe_f2cmpi instead of -llamf77mpi and see if the same error
occurs. (mpe2-1.0.3p1 works with recent version of LAM-MPI and OpenMPI).

In terms of capturing fortran application name in a C profiling library,
it is a tricking issue. AFAIK, it is related to the fortran compiler
and its provided functionalities of accessing command line argument. I
believe that this is the reason that LAM-MPI provides a generic name,
LAM_MPI_Fortran_program, to avoid the complication. (LAM-MPI experts can
confirm if I am wrong ?) In general, it has nothing to do with MPI.
Some MPI implementations use command line argument to start up the
parallel job, you will see altered command line argument in your profiled
MPI_Init(). For those implementations, if you were to hack into
implementation code, you may be able to fetch the fortran application
name.

A.Chan