I am trying to run a very simple MPI program in this environment : LAM-7.0
64bit, GM-2.0.2, Linux/Power4 and I encounter some problems when running
across 2 nodes. My test program seems to hang on the remote node. If I
attach to a process running on the remote node, I get :
#1 0x0000007fe0bfe324 in ioctl () from /lib64/libc.so.6
#2 0x0000007fe0361740 in __gm_user_ioctl (p=0x4, cmd=18179,
buf=0x1ff7fffcb70, bufsize=4) at ./libgm/_gm_user_ioctl.c:69
#3 0x0000007fe03616ac in _gm_user_ioctl (p=0x1005c630, cmd=18179, buf=0x0,
bufsize=0) at ./libgm/_gm_user_ioctl.c:127
#4 0x0000007fe0361538 in _gm_sleep (p=0x1005c630) at
./libgm/_gm_sleep.c:39
#5 0x0000007fe036aff4 in _gm_unknown (p=0x1005c630, e=0x7fe03b3438)
at ./libgm/gm_unknown.c:135
#6 0x0000007fe036b278 in gm_unknown (p=0x1005c630, e=0x4703)
at ./libgm/gm_unknown.c:355
#7 0x0000007fe01b4170 in lam_ssi_rpi_gm_gm_advance ()
from /opt/actc/lam-7.0/64/lib/libmpi.so.0
#8 0x0000007fe01c5310 in lam_ssi_rpi_gm_advance ()
from /opt/actc/lam-7.0/64/lib/libmpi.so.0
#9 0x0000007fe019b694 in _mpi_req_advance ()
from /opt/actc/lam-7.0/64/lib/libmpi.so.0
#10 0x0000007fe019c014 in lam_send () from
/opt/actc/lam-7.0/64/lib/libmpi.so.0
#11 0x0000007fe01e65e8 in PMPI_Send ()
from /opt/actc/lam-7.0/64/lib/libmpi.so.0
#12 0x0000007fe01af648 in lam_ssi_coll_lam_basic_reduce_log ()
from /opt/actc/lam-7.0/64/lib/libmpi.so.0
#13 0x0000007fe01e5828 in PMPI_Reduce ()
from /opt/actc/lam-7.0/64/lib/libmpi.so.0
#14 0x0000007fe01aed74 in lam_ssi_coll_lam_basic_allreduce ()
from /opt/actc/lam-7.0/64/lib/libmpi.so.0
#15 0x0000007fe0180b90 in MPI_Allreduce ()
from /opt/actc/lam-7.0/64/lib/libmpi.so.0
#16 0x0000007fe018eb88 in lam_coll_alloc_intra_cid ()
from /opt/actc/lam-7.0/64/lib/libmpi.so.0
#17 0x0000007fe01db674 in PMPI_Comm_split ()
from /opt/actc/lam-7.0/64/lib/libmpi.so.0
#18 0x0000007fe01b2058 in lam_ssi_coll_smp_init ()
from /opt/actc/lam-7.0/64/lib/libmpi.so.0
#19 0x0000007fe01ae92c in check_all_modules ()
from /opt/actc/lam-7.0/64/lib/libmpi.so.0
#20 0x0000007fe01ae62c in lam_ssi_coll_base_init_comm ()
from /opt/actc/lam-7.0/64/lib/libmpi.so.0
#21 0x0000007fe0192124 in lam_mpi_init ()
from /opt/actc/lam-7.0/64/lib/libmpi.so.0
#22 0x0000007fe018c154 in MPI_Init () from
/opt/actc/lam-7.0/64/lib/libmpi.so.0
If I run the same "hello world" program on 1 node or across 2 nodes with
-ssi rpi tcp it runs fine. Is LAM/MPI 7.0 known to work with GM-2.0.2 ?
Salutations/Regards.
============================================
Dr. Francois THOMAS, EMEA-PSSC RS/6000 SP Group
Tel : (33)-4-67344061, GSM : (33)-6-83258855 Fax : (33)-4-67346477
ft_at_[hidden], ICQ# 95392338, http://ft-fr.userv.ibm.com
============================================
|