LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: tlange_at_[hidden]
Date: 2011-09-02 12:48:37


> On 9/2/2011 9:27 AM, tlange_at_[hidden] wrote:
>> Ok, thank you!
>>
>> The compile seems ok now. However, I have segfaults during run (see
>> listing). I had those also on my computer at work, but could fix this by
>> a
>> provided replacement for the source file responsible for
>> MPI-communication... That doesn't work here. Anyway, is the error
>> listing
>> below suspiceous for MPI-communication problems?
>>
>> ###############################################################
>> /usr/bin/mpirun.openmpi -np 2 t2eco2n_mp
>> [abt1fk06:18827] *** Process received signal ***
>> [abt1fk06:18827] Signal: Segmentation fault (11)
>> [abt1fk06:18827] Signal code: Address not mapped (1)
>> [abt1fk06:18827] Failing at address: 0xf3
>> [abt1fk06:18827] [ 0] /lib/libpthread.so.0 [0x7f6579269a80]
>> [abt1fk06:18827] [ 1] /usr/lib/libmpi.so.0(MPI_Comm_size+0x4e)
>> [0x7f657a71c74e]
>> [abt1fk06:18827] [ 2] t2eco2n_mp(parallel_info+0x1e) [0x4b999e]
>> [abt1fk06:18827] [ 3] t2eco2n_mp(AZ_set_proc_config+0x2d) [0x4a4b12]
>> [abt1fk06:18827] [ 4] t2eco2n_mp(az_set_proc_config_+0xb) [0x4a427f]
>> [abt1fk06:18827] [ 5] t2eco2n_mp(cycit_+0x2204) [0x42f014]
>> [abt1fk06:18827] [ 6] t2eco2n_mp(MAIN__+0xfcf) [0x43575f]
>> [abt1fk06:18827] [ 7] t2eco2n_mp(main+0x2c) [0x4ea29c]
>> [abt1fk06:18827] [ 8] /lib/libc.so.6(__libc_start_main+0xe6)
>> [0x7f6578f261a6]
>> [abt1fk06:18827] [ 9] t2eco2n_mp [0x4138e9]
>> [abt1fk06:18827] *** End of error message ***
>> [abt1fk06:18828] *** Process received signal ***
>> [abt1fk06:18828] Signal: Segmentation fault (11)
>> [abt1fk06:18828] Signal code: Address not mapped (1)
>> [abt1fk06:18828] Failing at address: 0xf3
>> [abt1fk06:18828] [ 0] /lib/libpthread.so.0 [0x7fd690c5ea80]
>> [abt1fk06:18828] [ 1] /usr/lib/libmpi.so.0(MPI_Comm_size+0x4e)
>> [0x7fd69211174e]
>> [abt1fk06:18828] [ 2] t2eco2n_mp(parallel_info+0x1e) [0x4b999e]
>> [abt1fk06:18828] [ 3] t2eco2n_mp(AZ_set_proc_config+0x2d) [0x4a4b12]
>> [abt1fk06:18828] [ 4] t2eco2n_mp(az_set_proc_config_+0xb) [0x4a427f]
>> [abt1fk06:18828] [ 5] t2eco2n_mp(cycit_+0x2204) [0x42f014]
>> [abt1fk06:18828] [ 6] t2eco2n_mp(MAIN__+0xfcf) [0x43575f]
>> [abt1fk06:18828] [ 7] t2eco2n_mp(main+0x2c) [0x4ea29c]
>> [abt1fk06:18828] [ 8] /lib/libc.so.6(__libc_start_main+0xe6)
>> [0x7fd69091b1a6]
>> [abt1fk06:18828] [ 9] t2eco2n_mp [0x4138e9]
>> [abt1fk06:18828] *** End of error message ***
>> mpirun.openmpi noticed that job rank 0 with PID 18827 on node abt1fk06
>> exited on signal 11 (Segmentation fault).
>> 1 additional process aborted (not shown)
>> ###############################################################
>>
>>
>> Thank you
>> Torsten
>>
>>
>>> On 9/2/2011 7:33 AM, Jeff Squyres wrote:
>>>> You should probably be using Open MPI. I don't know if that will fix
>>>> your problem, but LAM/MPI was basically abandoned several years ago in
>>>> favor of Open MPI. Specifically: Open MPI is where all the
>>>> development
>>>> and community is these days.
>>>>
>>> Anyway, you can't mix lam and openmpi as you are doing by creating .o
>>> files with lam and using the openmpi mpif90 to link.
>>>>
>>>> On Sep 2, 2011, at 7:31 AM, tlange_at_[hidden] wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I try to compile a parallel fluid flow/transport code on a Debian
>>>>> Lenny
>>>>> system at home. I successfully have done this using Debian Squeeze at
>>>>> work.
>>>>>
>>>>> I use two libs. Aztec and Metis. Just when binding all modules I get
>>>>> undefined references to lam_mpi_...
>>>>>
>>>>> # BINDING:
>>>>> ==========
>>>>> mpif90 -o t2eco2n_mp -O -fdefault-real-8 Data_DD.o Mem_Alloc.o
>>>>> MULTI.o
>>>>> Main_Comp.o TOUGH2.o Compu_Eos.o Input_Output.o Mesh_Maker.o
>>>>> Paral_Subs.o
>>>>> Utility_F.o libmetis.a libaztec.a
>>>>> libaztec.a(az_old_matvec_mult.o): In function `AZ_matvec_mult':
>>>>> az_old_matvec_mult.c:(.text+0xc3): undefined reference to
>>>>> `lam_mpi_comm_world'
>>>>> # ...and so on
>>>>>
>>>>> # Bind with "-showme"
>>>>> gfortran -I/usr/lib/openmpi/include -pthread -I/usr/lib/openmpi/lib
>>>>> -o
>>>>> t2eco2n_mp -O -fdefault-real-8 Data_DD.o Mem_Alloc.o MULTI.o
>>>>> Main_Comp.o
>>>>> TOUGH2.o Compu_Eos.o Input_Output.o Mesh_Maker.o Paral_Subs.o
>>>>> Utility_F.o
>>>>> libmetis.a libaztec.a -L/usr/lib/openmpi/lib -lmpi_f90 -lmpi_f77
>>>>> -lmpi
>>>>> -lopen-rte -lopen-pal -ldl -Wl,--export-dynamic -lnsl -lutil -lm -ldl
>>>>>
>>>
>
> Yes, it's still suspicious that you have built with lam but are getting
> a run-time message from mpirun.openmpi, as if you still have openmpi on
> your PATH (ahead of or in place of lam). If you have installed openmpi
> in /usr/bin/ and /usr/lib/, this increases the difficulty of using
> another MPI.
> --
> Tim Prince

No, there was no lam anymore. I purged the Debian Lenny packages befor.
However, but worse, in my total mpi inexperience I had mpich installed
too, which interfered. I only slowly getting into this... After purging
mpich and repeated compile, the simulations run properely.

I really thank you!
Torsten