LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Santosh Mahadeorao Bobade (santosh_at_[hidden])
Date: 2004-01-31 04:15:24


Dear all,
I am new user of Lam MPI. I am trying to run paralle version of abinit
code.
I was able to congigure lam/Mpi with prefix --with-rsh=rsh --with-rpi=tcp.
There was no error reported while booting lam cluster. I have also run all
examples provided with lam-7.0.1.
It appears that every thing is okay.
Here is output of laminfo

           LAM/MPI: 7.0.4
            Prefix: /usr
      Architecture: i686-pc-linux-gnu
     Configured by: root
     Configured on: Fri Jan 30 23:01:32 IST 2004
    Configure host: node1
        C bindings: yes
      C++ bindings: yes
  Fortran bindings: yes
       C profiling: yes
     C++ profiling: yes
 Fortran profiling: yes
     ROMIO support: yes
      IMPI support: no
     Debug support: no
      Purify clean: no
          SSI boot: globus (Module v0.5)
          SSI boot: rsh (Module v1.0)
          SSI coll: lam_basic (Module v7.0)
          SSI coll: smp (Module v1.0)
           SSI rpi: crtcp (Module v1.0.1)
           SSI rpi: lamd (Module v7.0)
           SSI rpi: sysv (Module v7.0)
           SSI rpi: tcp (Module v7.0)
           SSI rpi: usysv (Module v7.0)
    
Kernel is 2.4.xx
and its RedHar 9
the /tmp folder has been nfsed
and -wd /home/santosh/abinit-4.1.4 where all excutables have been stored
the command which i am using is mpirun C -s n0 abinip <input.files>&
input.log and pwd is /home/santosh/abinit-4.1.4
process gets started and i can see the system log that communication has
been setup and and data are being transfered
Here is the system log details

Jan 31 00:53:53 node2 lamd[4967]: Link 0: node: 0, cpus: 1, type: 384, ip:
10.111.64.12
Jan 31 00:53:53 node2 lamd[4967]: Link 1: node: 1, cpus: 1, type: 0, ip:
10.111.64.5
Jan 31 00:53:53 node2 lamd[4967]: Link 2: node: 2, cpus: 1, type: 0, ip:
10.111.64.9
Jan 31 00:54:31 node2 lamd[4967]: flatd: flqload - successfully created
file /tmp/lam-santosh_at_node2/lam-flatd0
Jan 31 00:54:32 node2 lamd[4967]: flatd: flqload - file descriptor 15
Jan 31 00:54:32 node2 lamd[4967]: flatd: flqload - successfully appended 7
bytes to /tmp/lam-santosh_at_node2/lam-flatd0
Jan 31 00:54:33 node2 lamd[4967]: flatd: flqload - successfully created
file /tmp/lam-santosh_at_node2/lam-flatd1
Jan 31 00:54:33 node2 lamd[4967]: flatd: flqload - file descriptor 15
Jan 31 00:54:34 node2 lamd[4967]: flatd: flqload - successfully appended
8192 bytes to /tmp/lam-santosh_at_node2/lam-flatd1
Jan 31 00:55:05 node2 last message repeated 42 times
Jan 31 00:56:06 node2 last message repeated 79 times
Jan 31 00:57:07 node2 last message repeated 105 times
Jan 31 00:58:08 node2 last message repeated 90 times
Jan 31 00:59:09 node2 last message repeated 89 times
Jan 31 01:00:10 node2 last message repeated 91 times
Jan 31 01:00:24 node2 last message repeated 20 times
Jan 31 01:00:24 node2 lamd[4967]: kio_req: new client on fd=15
Jan 31 01:00:25 node2 lamd[4967]: flatd: flqload - successfully appended
8192 bytes to /tmp/lam-santosh_at_node2/lam-flatd1
Jan 31 01:00:25 node2 lamd[4967]: flatd: flqload - successfully appended
8192 bytes to /tmp/lam-santosh_at_node2/lam-flatd1
Jan 31 01:00:26 node2 lamd[4967]: kouter: attached process pid=4974,
pri=1095, fd=15
Jan 31 01:00:27 node2 lamd[4967]: flatd: flqload - successfully appended
8192 bytes to /tmp/lam-santosh_at_node2/lam-flatd1
Jan 31 01:00:58 node2 last message repeated 48 times
Jan 31 01:01:59 node2 last message repeated 100 times
Jan 31 01:03:00 node2 last message repeated 88 times
Jan 31 01:03:14 node2 last message repeated 28 times
Jan 31 01:03:14 node2 lamd[4967]: kouter: kqdetach detached process
pid=4974
Jan 31 01:03:14 node2 lamd[4967]: kouter: kqdetach calling kio_close
Jan 31 01:03:14 node2 lamd[4967]: kouter: kqdetach calling knuke
Jan 31 01:03:15 node2 lamd[4967]: flatd: flqload - successfully appended
8192 bytes to /tmp/lam-santosh_at_node2/lam-flatd1
Jan 31 01:03:46 node2 last message repeated 47 times
Jan 31 01:04:47 node2 last message repeated 81 times
Jan 31 01:05:48 node2 last message repeated 83 times
Jan 31 01:06:49 node2 last message repeated 104 times
Jan 31 01:07:50 node2 last message repeated 95 times
Jan 31 01:08:51 node2 last message repeated 96 times
Jan 31 01:09:52 node2 last message repeated 100 times
Jan 31 01:10:53 node2 last message repeated 105 times
Jan 31 01:11:54 node2 last message repeated 98 times
Jan 31 01:12:09 node2 last message repeated 27 times
Jan 31 01:12:10 node2 lamd[4967]: flatd: flqload - successfully appended
2119 bytes to /tmp/lam-santosh_at_node2/lam-flatd1
Jan 31 01:12:10 node2 lamd[4967]: kenyad: pqcreating with rtf 0x449810
Jan 31 01:12:10 node2 lamd[4967]: kenyad: checking for directory
/home/santosh/abinit-4.1.4
Jan 31 01:12:10 node2 lamd[4967]: kenyad: kenyad laded executable from
remote node: /tmp/lam-santosh_at_node2/lam-flatd1 and 4158
Jan 31 01:12:11 node2 lamd[4967]: kenyad: found
"/tmp/lam-santosh_at_node2/lam-flatd1"
Jan 31 01:12:11 node2 lamd[4967]: kenyad: creating new user process...
Jan 31 01:12:11 node2 lamd[4967]: kenyad: setting environment variables to
pass
to new process
Jan 31 01:12:11 node2 lamd[4967]: kenyad: setting TROLLIUSFD
Jan 31 01:12:11 node2 lamd[4967]: kenyad: setting TROLLIUSRTF
Jan 31 01:12:11 node2 lamd[4967]: kenyad: setting LAMJOBID
Jan 31 01:12:12 node2 lamd[4967]: kenyad: setting LAMKENYAPID
Jan 31 01:12:12 node2 lamd[4967]: kenyad: setting LAMWORLD
Jan 31 01:12:12 node2 lamd[4967]: kenyad: setting LAMPARENT
Jan 31 01:12:12 node2 lamd[4967]: kenyad: setting LAMRANK
Jan 31 01:12:12 node2 lamd[4967]: kenyad: checking for working directory
flag
Jan 31 01:12:13 node2 lamd[4967]: kenyad: working directory set explicitly
Jan 31 01:12:13 node2 lamd[4967]: kenyad: running in directory
/home/santosh/abinit-4.1.4
Jan 31 01:12:13 node2 lamd[4967]: kenyad: fork/exec succeeded, pid 4986,
index 11, rtf 0x449812
Jan 31 01:12:13 node2 lamd[4967]: kenyad: create succeeded, process
running
Jan 31 01:12:14 node2 lamd[4967]: kenyad: removing load module
"/tmp/lam-santosh_at_node2/lam-flatd1"
Jan 31 01:12:14 node2 lamd[4967]: died: caught child death; trying to
detach

Can anybody help me out?

Thanks in advance.

With Regards and Bestwishes
santosh Bobade
Indian Institute of Technology Bombay
India