Hi,
Can you please send the output of mpirun itself besides this syslog output
that you have sent. That might give us a more better idea about the
problem thats occuring.
Nihar
On Sat, 31 Jan 2004, Santosh Mahadeorao Bobade wrote:
>Dear all,
>I am new user of Lam MPI. I am trying to run paralle version of abinit
>code.
>I was able to congigure lam/Mpi with prefix --with-rsh=rsh --with-rpi=tcp.
>There was no error reported while booting lam cluster. I have also run all
>examples provided with lam-7.0.1.
>It appears that every thing is okay.
>Here is output of laminfo
>
> LAM/MPI: 7.0.4
> Prefix: /usr
> Architecture: i686-pc-linux-gnu
> Configured by: root
> Configured on: Fri Jan 30 23:01:32 IST 2004
> Configure host: node1
> C bindings: yes
> C++ bindings: yes
> Fortran bindings: yes
> C profiling: yes
> C++ profiling: yes
> Fortran profiling: yes
> ROMIO support: yes
> IMPI support: no
> Debug support: no
> Purify clean: no
> SSI boot: globus (Module v0.5)
> SSI boot: rsh (Module v1.0)
> SSI coll: lam_basic (Module v7.0)
> SSI coll: smp (Module v1.0)
> SSI rpi: crtcp (Module v1.0.1)
> SSI rpi: lamd (Module v7.0)
> SSI rpi: sysv (Module v7.0)
> SSI rpi: tcp (Module v7.0)
> SSI rpi: usysv (Module v7.0)
>
>Kernel is 2.4.xx
>and its RedHar 9
>the /tmp folder has been nfsed
>and -wd /home/santosh/abinit-4.1.4 where all excutables have been stored
>the command which i am using is mpirun C -s n0 abinip <input.files>&
>input.log and pwd is /home/santosh/abinit-4.1.4
>process gets started and i can see the system log that communication has
>been setup and and data are being transfered
>Here is the system log details
>
>Jan 31 00:53:53 node2 lamd[4967]: Link 0: node: 0, cpus: 1, type: 384, ip:
>10.111.64.12
>Jan 31 00:53:53 node2 lamd[4967]: Link 1: node: 1, cpus: 1, type: 0, ip:
>10.111.64.5
>Jan 31 00:53:53 node2 lamd[4967]: Link 2: node: 2, cpus: 1, type: 0, ip:
>10.111.64.9
>Jan 31 00:54:31 node2 lamd[4967]: flatd: flqload - successfully created
>file /tmp/lam-santosh_at_node2/lam-flatd0
>Jan 31 00:54:32 node2 lamd[4967]: flatd: flqload - file descriptor 15
>Jan 31 00:54:32 node2 lamd[4967]: flatd: flqload - successfully appended 7
>bytes to /tmp/lam-santosh_at_node2/lam-flatd0
>Jan 31 00:54:33 node2 lamd[4967]: flatd: flqload - successfully created
>file /tmp/lam-santosh_at_node2/lam-flatd1
>Jan 31 00:54:33 node2 lamd[4967]: flatd: flqload - file descriptor 15
>Jan 31 00:54:34 node2 lamd[4967]: flatd: flqload - successfully appended
>8192 bytes to /tmp/lam-santosh_at_node2/lam-flatd1
>Jan 31 00:55:05 node2 last message repeated 42 times
>Jan 31 00:56:06 node2 last message repeated 79 times
>Jan 31 00:57:07 node2 last message repeated 105 times
>Jan 31 00:58:08 node2 last message repeated 90 times
>Jan 31 00:59:09 node2 last message repeated 89 times
>Jan 31 01:00:10 node2 last message repeated 91 times
>Jan 31 01:00:24 node2 last message repeated 20 times
>Jan 31 01:00:24 node2 lamd[4967]: kio_req: new client on fd=15
>Jan 31 01:00:25 node2 lamd[4967]: flatd: flqload - successfully appended
>8192 bytes to /tmp/lam-santosh_at_node2/lam-flatd1
>Jan 31 01:00:25 node2 lamd[4967]: flatd: flqload - successfully appended
>8192 bytes to /tmp/lam-santosh_at_node2/lam-flatd1
>Jan 31 01:00:26 node2 lamd[4967]: kouter: attached process pid=4974,
>pri=1095, fd=15
>Jan 31 01:00:27 node2 lamd[4967]: flatd: flqload - successfully appended
>8192 bytes to /tmp/lam-santosh_at_node2/lam-flatd1
>Jan 31 01:00:58 node2 last message repeated 48 times
>Jan 31 01:01:59 node2 last message repeated 100 times
>Jan 31 01:03:00 node2 last message repeated 88 times
>Jan 31 01:03:14 node2 last message repeated 28 times
>Jan 31 01:03:14 node2 lamd[4967]: kouter: kqdetach detached process
>pid=4974
>Jan 31 01:03:14 node2 lamd[4967]: kouter: kqdetach calling kio_close
>Jan 31 01:03:14 node2 lamd[4967]: kouter: kqdetach calling knuke
>Jan 31 01:03:15 node2 lamd[4967]: flatd: flqload - successfully appended
>8192 bytes to /tmp/lam-santosh_at_node2/lam-flatd1
>Jan 31 01:03:46 node2 last message repeated 47 times
>Jan 31 01:04:47 node2 last message repeated 81 times
>Jan 31 01:05:48 node2 last message repeated 83 times
>Jan 31 01:06:49 node2 last message repeated 104 times
>Jan 31 01:07:50 node2 last message repeated 95 times
>Jan 31 01:08:51 node2 last message repeated 96 times
>Jan 31 01:09:52 node2 last message repeated 100 times
>Jan 31 01:10:53 node2 last message repeated 105 times
>Jan 31 01:11:54 node2 last message repeated 98 times
>Jan 31 01:12:09 node2 last message repeated 27 times
>Jan 31 01:12:10 node2 lamd[4967]: flatd: flqload - successfully appended
>2119 bytes to /tmp/lam-santosh_at_node2/lam-flatd1
>Jan 31 01:12:10 node2 lamd[4967]: kenyad: pqcreating with rtf 0x449810
>Jan 31 01:12:10 node2 lamd[4967]: kenyad: checking for directory
>/home/santosh/abinit-4.1.4
>Jan 31 01:12:10 node2 lamd[4967]: kenyad: kenyad laded executable from
>remote node: /tmp/lam-santosh_at_node2/lam-flatd1 and 4158
>Jan 31 01:12:11 node2 lamd[4967]: kenyad: found
>"/tmp/lam-santosh_at_node2/lam-flatd1"
>Jan 31 01:12:11 node2 lamd[4967]: kenyad: creating new user process...
>Jan 31 01:12:11 node2 lamd[4967]: kenyad: setting environment variables to
>pass
>to new process
>Jan 31 01:12:11 node2 lamd[4967]: kenyad: setting TROLLIUSFD
>Jan 31 01:12:11 node2 lamd[4967]: kenyad: setting TROLLIUSRTF
>Jan 31 01:12:11 node2 lamd[4967]: kenyad: setting LAMJOBID
>Jan 31 01:12:12 node2 lamd[4967]: kenyad: setting LAMKENYAPID
>Jan 31 01:12:12 node2 lamd[4967]: kenyad: setting LAMWORLD
>Jan 31 01:12:12 node2 lamd[4967]: kenyad: setting LAMPARENT
>Jan 31 01:12:12 node2 lamd[4967]: kenyad: setting LAMRANK
>Jan 31 01:12:12 node2 lamd[4967]: kenyad: checking for working directory
>flag
>Jan 31 01:12:13 node2 lamd[4967]: kenyad: working directory set explicitly
>Jan 31 01:12:13 node2 lamd[4967]: kenyad: running in directory
>/home/santosh/abinit-4.1.4
>Jan 31 01:12:13 node2 lamd[4967]: kenyad: fork/exec succeeded, pid 4986,
>index 11, rtf 0x449812
>Jan 31 01:12:13 node2 lamd[4967]: kenyad: create succeeded, process
>running
>Jan 31 01:12:14 node2 lamd[4967]: kenyad: removing load module
>"/tmp/lam-santosh_at_node2/lam-flatd1"
>Jan 31 01:12:14 node2 lamd[4967]: died: caught child death; trying to
>detach
>
>Can anybody help me out?
>
>Thanks in advance.
>
>With Regards and Bestwishes
>santosh Bobade
>Indian Institute of Technology Bombay
>India
>
>_______________________________________________
>This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
Powered by LAM/MPI...
---------------------------------------
Nihar Sanghvi
LAM/MPI Team
Graduate Student (Indiana University)
http://www.lam-mpi.org
--------------------------------------
|