Dear all,
I am new user of Lam MPI. I am trying to run paralle version of abinit
code.
I was able to congigure lam/Mpi with prefix --with-rsh=rsh --with-rpi=tcp.
There was no error reported while booting lam cluster. I have also run all
examples provided with lam-7.0.1.
It appears that every thing is okay.
Here is output of laminfo
LAM/MPI: 7.0.4
Prefix: /usr
Architecture: i686-pc-linux-gnu
Configured by: root
Configured on: Fri Jan 30 23:01:32 IST 2004
Configure host: node1
C bindings: yes
C++ bindings: yes
Fortran bindings: yes
C profiling: yes
C++ profiling: yes
Fortran profiling: yes
ROMIO support: yes
IMPI support: no
Debug support: no
Purify clean: no
SSI boot: globus (Module v0.5)
SSI boot: rsh (Module v1.0)
SSI coll: lam_basic (Module v7.0)
SSI coll: smp (Module v1.0)
SSI rpi: crtcp (Module v1.0.1)
SSI rpi: lamd (Module v7.0)
SSI rpi: sysv (Module v7.0)
SSI rpi: tcp (Module v7.0)
SSI rpi: usysv (Module v7.0)
Kernel is 2.4.xx
and its RedHar 9
the /tmp folder has been nfsed
and -wd /home/santosh/abinit-4.1.4 where all excutables have been stored
the command which i am using is mpirun C -s n0 abinip <input.files>&
input.log and pwd is /home/santosh/abinit-4.1.4
process gets started and i can see the system log that communication has
been setup and and data are being transfered
Here is the system log details
Jan 31 00:53:53 node2 lamd[4967]: Link 0: node: 0, cpus: 1, type: 384, ip:
10.111.64.12
Jan 31 00:53:53 node2 lamd[4967]: Link 1: node: 1, cpus: 1, type: 0, ip:
10.111.64.5
Jan 31 00:53:53 node2 lamd[4967]: Link 2: node: 2, cpus: 1, type: 0, ip:
10.111.64.9
Jan 31 00:54:31 node2 lamd[4967]: flatd: flqload - successfully created
file /tmp/lam-santosh_at_node2/lam-flatd0
Jan 31 00:54:32 node2 lamd[4967]: flatd: flqload - file descriptor 15
Jan 31 00:54:32 node2 lamd[4967]: flatd: flqload - successfully appended 7
bytes to /tmp/lam-santosh_at_node2/lam-flatd0
Jan 31 00:54:33 node2 lamd[4967]: flatd: flqload - successfully created
file /tmp/lam-santosh_at_node2/lam-flatd1
Jan 31 00:54:33 node2 lamd[4967]: flatd: flqload - file descriptor 15
Jan 31 00:54:34 node2 lamd[4967]: flatd: flqload - successfully appended
8192 bytes to /tmp/lam-santosh_at_node2/lam-flatd1
Jan 31 00:55:05 node2 last message repeated 42 times
Jan 31 00:56:06 node2 last message repeated 79 times
Jan 31 00:57:07 node2 last message repeated 105 times
Jan 31 00:58:08 node2 last message repeated 90 times
Jan 31 00:59:09 node2 last message repeated 89 times
Jan 31 01:00:10 node2 last message repeated 91 times
Jan 31 01:00:24 node2 last message repeated 20 times
Jan 31 01:00:24 node2 lamd[4967]: kio_req: new client on fd=15
Jan 31 01:00:25 node2 lamd[4967]: flatd: flqload - successfully appended
8192 bytes to /tmp/lam-santosh_at_node2/lam-flatd1
Jan 31 01:00:25 node2 lamd[4967]: flatd: flqload - successfully appended
8192 bytes to /tmp/lam-santosh_at_node2/lam-flatd1
Jan 31 01:00:26 node2 lamd[4967]: kouter: attached process pid=4974,
pri=1095, fd=15
Jan 31 01:00:27 node2 lamd[4967]: flatd: flqload - successfully appended
8192 bytes to /tmp/lam-santosh_at_node2/lam-flatd1
Jan 31 01:00:58 node2 last message repeated 48 times
Jan 31 01:01:59 node2 last message repeated 100 times
Jan 31 01:03:00 node2 last message repeated 88 times
Jan 31 01:03:14 node2 last message repeated 28 times
Jan 31 01:03:14 node2 lamd[4967]: kouter: kqdetach detached process
pid=4974
Jan 31 01:03:14 node2 lamd[4967]: kouter: kqdetach calling kio_close
Jan 31 01:03:14 node2 lamd[4967]: kouter: kqdetach calling knuke
Jan 31 01:03:15 node2 lamd[4967]: flatd: flqload - successfully appended
8192 bytes to /tmp/lam-santosh_at_node2/lam-flatd1
Jan 31 01:03:46 node2 last message repeated 47 times
Jan 31 01:04:47 node2 last message repeated 81 times
Jan 31 01:05:48 node2 last message repeated 83 times
Jan 31 01:06:49 node2 last message repeated 104 times
Jan 31 01:07:50 node2 last message repeated 95 times
Jan 31 01:08:51 node2 last message repeated 96 times
Jan 31 01:09:52 node2 last message repeated 100 times
Jan 31 01:10:53 node2 last message repeated 105 times
Jan 31 01:11:54 node2 last message repeated 98 times
Jan 31 01:12:09 node2 last message repeated 27 times
Jan 31 01:12:10 node2 lamd[4967]: flatd: flqload - successfully appended
2119 bytes to /tmp/lam-santosh_at_node2/lam-flatd1
Jan 31 01:12:10 node2 lamd[4967]: kenyad: pqcreating with rtf 0x449810
Jan 31 01:12:10 node2 lamd[4967]: kenyad: checking for directory
/home/santosh/abinit-4.1.4
Jan 31 01:12:10 node2 lamd[4967]: kenyad: kenyad laded executable from
remote node: /tmp/lam-santosh_at_node2/lam-flatd1 and 4158
Jan 31 01:12:11 node2 lamd[4967]: kenyad: found
"/tmp/lam-santosh_at_node2/lam-flatd1"
Jan 31 01:12:11 node2 lamd[4967]: kenyad: creating new user process...
Jan 31 01:12:11 node2 lamd[4967]: kenyad: setting environment variables to
pass
to new process
Jan 31 01:12:11 node2 lamd[4967]: kenyad: setting TROLLIUSFD
Jan 31 01:12:11 node2 lamd[4967]: kenyad: setting TROLLIUSRTF
Jan 31 01:12:11 node2 lamd[4967]: kenyad: setting LAMJOBID
Jan 31 01:12:12 node2 lamd[4967]: kenyad: setting LAMKENYAPID
Jan 31 01:12:12 node2 lamd[4967]: kenyad: setting LAMWORLD
Jan 31 01:12:12 node2 lamd[4967]: kenyad: setting LAMPARENT
Jan 31 01:12:12 node2 lamd[4967]: kenyad: setting LAMRANK
Jan 31 01:12:12 node2 lamd[4967]: kenyad: checking for working directory
flag
Jan 31 01:12:13 node2 lamd[4967]: kenyad: working directory set explicitly
Jan 31 01:12:13 node2 lamd[4967]: kenyad: running in directory
/home/santosh/abinit-4.1.4
Jan 31 01:12:13 node2 lamd[4967]: kenyad: fork/exec succeeded, pid 4986,
index 11, rtf 0x449812
Jan 31 01:12:13 node2 lamd[4967]: kenyad: create succeeded, process
running
Jan 31 01:12:14 node2 lamd[4967]: kenyad: removing load module
"/tmp/lam-santosh_at_node2/lam-flatd1"
Jan 31 01:12:14 node2 lamd[4967]: died: caught child death; trying to
detach
Can anybody help me out?
Thanks in advance.
With Regards and Bestwishes
santosh Bobade
Indian Institute of Technology Bombay
India
|