LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: P. W. Leung (phleung_at_[hidden])
Date: 2004-02-04 02:23:36


Recently I encountered a problem using LAM 7 together with OpenMosix. To
make it simple, I install RH9 on a P4 PC and patch it with OpenMosix
2.4.22-2 rpm. Since there is only one node, no process will be migrated
by OpenMosix. I then run an MPI program compiled with LAM 7.0.4. This
program uses one process only and therefore no MPI communication occurs.
  I find that as the program runs, it suddenly take up more and more
memory until it exceeds 2GB and die. This is reproducible and when I run
it in gdb, I get the error messages as attached. It seems to point to
the pthread library. Note that this problem occurs only when I use LAM 7
and OpenMosix together. In particular, there is no problem under the
following situations:

1. when I turn off OpenMosix by "/etc/rc.d/init.d/openmosix stop" and
run an MPI program compiled with lam 7.0.4.
2. when I have openmosix on but run a serial (non-MPI) program
3. when openmosix is on and run an MPI program compiled with lam 6.5.9

Finally, I find that the same problem occurs if I run HPL from netlib.
So this is not a problem with our program only.

---------------------------
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 16384 (LWP 22439)]
0x40079e8c in __pthread_alt_lock () from /lib/i686/libpthread.so.0
(gdb) where
#0 0x40079e8c in __pthread_alt_lock () from /lib/i686/libpthread.so.0
#1 0x40076da7 in pthread_mutex_lock () from /lib/i686/libpthread.so.0
#2 0x401333ba in free () from /lib/i686/libc.so.6
#3 0x080836b1 in lam_ssi_rpi_tcp_destroy ()
#4 0x080734b0 in _mpi_req_destroy ()
#5 0x0807b62f in lam_test ()
#6 0x0807784e in MPI_Test ()
#7 0x0805388c in multserver (a=0x4266e008, d=0x103264e, ia=0x0) at
mesg.c:86
#8 0x08053760 in kill_msgserver (send_id=0x80b5ba0, a=0x4266e008,
     d=0x103264e, ia=0x0, count=0x80b9350, handler=0x805386e <multserver>)
     at mesg.c:58
#9 0x080517f8 in multiply_add_matrix_with_vector (pv=7, pw=8,
v0=0x4266e008,
     v1=0x4a802008) at mult.c:141
#10 0x08052698 in lanczos (restart_flag=0, eigenvector_flag=0,
read_w_flag=0,
     green_flag=0) at lanczos.c:219
#11 0x0804ab6b in main (argc=0, argv=0xbffff744) at tJ2h.c:431
#12 0x400d5a67 in __libc_start_main () from /lib/i686/libc.so.6
(gdb)

-- 
====================================================================
Pakwo Leung / Associate Professor
Physics Department                          |Email: P.W.Leung_at_[hidden]
Hong Kong University of Science & Technology|Phone: +(852)2358-7483
Clear Water Bay, Hong Kong                  |FAX:   +(852)2358-1652
http://physics.ust.hk/Leung