On Wed, 26 Jan 2005 feldy_at_[hidden] wrote:
>I'm trying to get Pallas AlltoAll to work on a cluster of SMPs using LAM-7.1.1
>(same behavior with 7.0.6). This is using standard tcp/ip via intel e1000
>on-board NICs. Using dual-processor 3GHz Xeon HT disabled.
>Linux FC3
>2.6.9-1.667smp #1 SMP Tue Nov 2 14:59:52 EST 2004 i686 i686 i386 GNU/Linux
ran this through
-ssi rpi lamd
and it still hangs.
gdb'd all the processes and see this on each process
(gdb) where
#0 0xf6f624f8 in read () from /lib/i686/libpthread.so.0
#1 0x00000050 in ?? ()
#2 0x0809cfbb in mread ()
#3 0x08090e92 in _cio_kreqback ()
#4 0x0809d868 in _cipc_ksrback ()
#5 0x0809dcf1 in ksr ()
#6 0x08096422 in dsfr ()
#7 0x08096932 in bfrecv ()
#8 0x08074c1d in lamd_recv ()
#9 0x08074339 in lamd_adv1 ()
#10 0x08074a7a in lam_ssi_rpi_lamd_advance ()
#11 0x08052d55 in _mpi_req_advance ()
#12 0x0808520e in PMPI_Recv ()
#13 0x08079adb in lam_ssi_coll_lam_basic_bcast_lin_lamd ()
#14 0x080762db in PMPI_Bcast ()
#15 0x0807f920 in lam_ssi_coll_smp_bcast ()
#16 0x0808281a in lam_ssi_coll_smp_allreduce ()
#17 0x08054de3 in MPI_Allreduce ()
#18 0x0805656d in lam_coll_alloc_intra_cid ()
#19 0x08050ebf in MPI_Comm_split ()
#20 0x0804b7f2 in Set_Communicator ()
#21 0x0804b531 in Init_Communicator ()
#22 0x0804aa41 in main ()
[feldy_at_rudolph ~]$ mpitask -gps
TASK (GPS/L) FUNCTION PEER|ROOT TAG COMM COUNT DATATYPE
n0,i11/0 PMB-MPI1 Recv n8,i12/17 86 WORLD* 4095 BYTE
n0,i12/1 PMB-MPI1 Recv n0,i11/0 15 <2>* 4095 BYTE
n1,i11/1 PMB-MPI1 Recv n0,i11/0 15 <3>* 4095 BYTE
n1,i12/1 PMB-MPI1 Recv n1,i11/0 15 <2>* 4095 BYTE
n2,i11/2 PMB-MPI1 Recv n0,i11/0 15 <3>* 4095 BYTE
n2,i12/1 PMB-MPI1 Recv n2,i11/0 15 <2>* 4095 BYTE
n3,i11/3 PMB-MPI1 Recv n1,i11/1 15 <3>* 4095 BYTE
n3,i12/1 PMB-MPI1 Recv n3,i11/0 15 <2>* 4095 BYTE
n4,i11/4 PMB-MPI1 Recv n0,i11/0 15 <3>* 4095 BYTE
n4,i12/1 PMB-MPI1 Recv n4,i11/0 15 <2>* 4095 BYTE
n5,i11/5 PMB-MPI1 Recv n1,i11/1 15 <3>* 4095 BYTE
n5,i12/1 PMB-MPI1 Recv n5,i11/0 15 <2>* 4095 BYTE
n6,i11/6 PMB-MPI1 Recv n2,i11/2 15 <3>* 4095 BYTE
n6,i12/1 PMB-MPI1 Recv n6,i11/0 15 <2>* 4095 BYTE
n7,i11/7 PMB-MPI1 Recv n3,i11/3 15 <3>* 4095 BYTE
n7,i12/1 PMB-MPI1 Recv n7,i11/0 15 <2>* 4095 BYTE
n8,i11/8 PMB-MPI1 Recv n0,i11/0 15 <3>* 4095 BYTE
n8,i12/1 PMB-MPI1 Recv n8,i11/0 15 <2>* 4095 BYTE
[feldy_at_rudolph ~]$ mpimsg -gps
SRC (GPS/L) DEST (GPS/L) TAG COMM COUNT DATATYPE MSG
[no output]
I'm baffled.
Bob
|