LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Tim Prince (n8tm_at_[hidden])
Date: 2011-09-02 11:14:13


On 9/2/2011 9:27 AM, tlange_at_[hidden] wrote:
> Ok, thank you!
>
> The compile seems ok now. However, I have segfaults during run (see
> listing). I had those also on my computer at work, but could fix this by a
> provided replacement for the source file responsible for
> MPI-communication... That doesn't work here. Anyway, is the error listing
> below suspiceous for MPI-communication problems?
>
> ###############################################################
> /usr/bin/mpirun.openmpi -np 2 t2eco2n_mp
> [abt1fk06:18827] *** Process received signal ***
> [abt1fk06:18827] Signal: Segmentation fault (11)
> [abt1fk06:18827] Signal code: Address not mapped (1)
> [abt1fk06:18827] Failing at address: 0xf3
> [abt1fk06:18827] [ 0] /lib/libpthread.so.0 [0x7f6579269a80]
> [abt1fk06:18827] [ 1] /usr/lib/libmpi.so.0(MPI_Comm_size+0x4e)
> [0x7f657a71c74e]
> [abt1fk06:18827] [ 2] t2eco2n_mp(parallel_info+0x1e) [0x4b999e]
> [abt1fk06:18827] [ 3] t2eco2n_mp(AZ_set_proc_config+0x2d) [0x4a4b12]
> [abt1fk06:18827] [ 4] t2eco2n_mp(az_set_proc_config_+0xb) [0x4a427f]
> [abt1fk06:18827] [ 5] t2eco2n_mp(cycit_+0x2204) [0x42f014]
> [abt1fk06:18827] [ 6] t2eco2n_mp(MAIN__+0xfcf) [0x43575f]
> [abt1fk06:18827] [ 7] t2eco2n_mp(main+0x2c) [0x4ea29c]
> [abt1fk06:18827] [ 8] /lib/libc.so.6(__libc_start_main+0xe6) [0x7f6578f261a6]
> [abt1fk06:18827] [ 9] t2eco2n_mp [0x4138e9]
> [abt1fk06:18827] *** End of error message ***
> [abt1fk06:18828] *** Process received signal ***
> [abt1fk06:18828] Signal: Segmentation fault (11)
> [abt1fk06:18828] Signal code: Address not mapped (1)
> [abt1fk06:18828] Failing at address: 0xf3
> [abt1fk06:18828] [ 0] /lib/libpthread.so.0 [0x7fd690c5ea80]
> [abt1fk06:18828] [ 1] /usr/lib/libmpi.so.0(MPI_Comm_size+0x4e)
> [0x7fd69211174e]
> [abt1fk06:18828] [ 2] t2eco2n_mp(parallel_info+0x1e) [0x4b999e]
> [abt1fk06:18828] [ 3] t2eco2n_mp(AZ_set_proc_config+0x2d) [0x4a4b12]
> [abt1fk06:18828] [ 4] t2eco2n_mp(az_set_proc_config_+0xb) [0x4a427f]
> [abt1fk06:18828] [ 5] t2eco2n_mp(cycit_+0x2204) [0x42f014]
> [abt1fk06:18828] [ 6] t2eco2n_mp(MAIN__+0xfcf) [0x43575f]
> [abt1fk06:18828] [ 7] t2eco2n_mp(main+0x2c) [0x4ea29c]
> [abt1fk06:18828] [ 8] /lib/libc.so.6(__libc_start_main+0xe6) [0x7fd69091b1a6]
> [abt1fk06:18828] [ 9] t2eco2n_mp [0x4138e9]
> [abt1fk06:18828] *** End of error message ***
> mpirun.openmpi noticed that job rank 0 with PID 18827 on node abt1fk06
> exited on signal 11 (Segmentation fault).
> 1 additional process aborted (not shown)
> ###############################################################
>
>
> Thank you
> Torsten
>
>
>> On 9/2/2011 7:33 AM, Jeff Squyres wrote:
>>> You should probably be using Open MPI. I don't know if that will fix
>>> your problem, but LAM/MPI was basically abandoned several years ago in
>>> favor of Open MPI. Specifically: Open MPI is where all the development
>>> and community is these days.
>>>
>> Anyway, you can't mix lam and openmpi as you are doing by creating .o
>> files with lam and using the openmpi mpif90 to link.
>>>
>>> On Sep 2, 2011, at 7:31 AM, tlange_at_[hidden] wrote:
>>>
>>>> Hi,
>>>>
>>>> I try to compile a parallel fluid flow/transport code on a Debian Lenny
>>>> system at home. I successfully have done this using Debian Squeeze at
>>>> work.
>>>>
>>>> I use two libs. Aztec and Metis. Just when binding all modules I get
>>>> undefined references to lam_mpi_...
>>>>
>>>> # BINDING:
>>>> ==========
>>>> mpif90 -o t2eco2n_mp -O -fdefault-real-8 Data_DD.o Mem_Alloc.o MULTI.o
>>>> Main_Comp.o TOUGH2.o Compu_Eos.o Input_Output.o Mesh_Maker.o
>>>> Paral_Subs.o
>>>> Utility_F.o libmetis.a libaztec.a
>>>> libaztec.a(az_old_matvec_mult.o): In function `AZ_matvec_mult':
>>>> az_old_matvec_mult.c:(.text+0xc3): undefined reference to
>>>> `lam_mpi_comm_world'
>>>> az_old_matvec_mult.c:(.text+0xd9): undefined reference to
>>>> `lam_mpi_comm_world'
>>>> libaztec.a(md_wrap_mpi_c.o): In function `md_mpi_iwrite':
>>>> md_wrap_mpi_c.c:(.text+0xe): undefined reference to
>>>> `lam_mpi_comm_world'
>>>> md_wrap_mpi_c.c:(.text+0x18): undefined reference to `lam_mpi_byte'
>>>> md_wrap_mpi_c.c:(.text+0x31): undefined reference to
>>>> `lam_mpi_comm_world'
>>>> md_wrap_mpi_c.c:(.text+0x3b): undefined reference to `lam_mpi_byte'
>>>> libaztec.a(md_wrap_mpi_c.o): In function `md_wrap_iwrite':
>>>>
>>>> # ...and so on
>>>>
>>>> # Bind with "-showme"
>>>> gfortran -I/usr/lib/openmpi/include -pthread -I/usr/lib/openmpi/lib -o
>>>> t2eco2n_mp -O -fdefault-real-8 Data_DD.o Mem_Alloc.o MULTI.o
>>>> Main_Comp.o
>>>> TOUGH2.o Compu_Eos.o Input_Output.o Mesh_Maker.o Paral_Subs.o
>>>> Utility_F.o
>>>> libmetis.a libaztec.a -L/usr/lib/openmpi/lib -lmpi_f90 -lmpi_f77 -lmpi
>>>> -lopen-rte -lopen-pal -ldl -Wl,--export-dynamic -lnsl -lutil -lm -ldl
>>>>
>>

Yes, it's still suspicious that you have built with lam but are getting
a run-time message from mpirun.openmpi, as if you still have openmpi on
your PATH (ahead of or in place of lam). If you have installed openmpi
in /usr/bin/ and /usr/lib/, this increases the difficulty of using
another MPI.

-- 
Tim Prince