Send lam mailing list submissions to
lam@lam-mpi.org
To subscribe or unsubscribe via the World Wide Web, visit
http://www.lam-mpi.org/mailman/listinfo.cgi/lam
or, via email, send a message with subject or body 'help' to
lam-request@lam-mpi.org
You can reach the person managing the list at
lam-owner@lam-mpi.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of lam digest..."
Today's Topics:
1. processes with rank>0 exit after doing one receive and send
to process with rank =0 (Elf Okoye)
2. RPI fatal error (Sara Campos)
3. Mixing mpic++ and G95 (Serge Van Criekingen)
4. Re: Mixing mpic++ and G95 (Tim Prins)
5. Re: RPI fatal error (Jeff Squyres)
---------- Forwarded message ----------
From: "Elf Okoye" <ifiokoye@gmail.com >
To: lam@lam-mpi.org
Date: Wed, 14 Nov 2007 22:17:45 -0800
Subject: LAM: processes with rank>0 exit after doing one receive and send to process with rank =0
Hi all,
I'm doing a simple image convolution.
At this point, I'm just sending different rows of the image to the clients (rank>0) from the server (rank=0), having the client do the convolution for that row and send back the resulting image to the server.....[Once I understand whats going on....its trivial to send all the rows to all the diff clients and then go down and do a receive from all of them]
I'm able to send to the clients and receive from them once only.
After this (from the output of -sa), the clients exit cleanly (kill = 0 and status=0) [but they aren't supposed to]
So, the server cant send/receive from them anymore
And so the server (rank 0) ends up with kill=1 (bad exit) and signal=13(permission denied ...probably because its trying to send to processes that have exited).
I'm running LAM 7.1.2/MPI 2 on Fedora core 5
And its C code not Fortran
All insights are welcome.
--Elf
--
"For those who understand, no explanation is needed; for those who do not, none will do."
---------- Forwarded message ----------
From: Sara Campos <scampos@itqb.unl.pt>
To: lam@lam-mpi.org
Date: Thu, 15 Nov 2007 15:37:25 +0000
Subject: LAM: RPI fatal error
Hello,
We are LAM/MPI beginners who are using parallelization to run molecular simulation programs.
We have observed in some machines the following error (which seems to be solved when we reboot the machine):
The selected RPI failed to initialize during MPI_INIT. This is a
fatal error; I must abort.
This occurred on host model24.itqb.unl.pt (n0).
The PID of failed process was 30686 (MPI_COMM_WORLD rank: 0)
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
One of the processes started by mpirun has exited with a nonzero exit
code. This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.
PID 30687 failed on node n0 (193.136.181.161) with exit status 1.
This error is serious to us because it kills all our parallel jobs that are directed to the problematic machine by the queuing system.
The lam was simply installed by rpm in all machines with no further configuration. The commands we use are lamboot <machines>, mpirun C <executable> and lamhalt.
We tried to search the manual but it is a bit too advanced for us. Can you explain us what the problem is and how can it be solved?
Thanks in advance
Sara Campos
---------- Forwarded message ----------
From: Serge Van Criekingen <svancri@yahoo.fr>
To: lam@lam-mpi.org
Date: Thu, 15 Nov 2007 17:11:45 +0100 (CET)
Subject: LAM: Mixing mpic++ and G95
Hello,
I am trying to call a g95 fortran routine from a C(++) main file.
Here are the 2 files:
globalMain.c:
#include <iostream>
using namespace std;
extern"C" {
void fortfunc_(int *ii, float *ff);
}
int main( int argc , char* argv[] )
{
int ii=5;
float ff=5.5;
fortfunc_(&ii, &ff);
return 0;
}
functF.f :
subroutine fortfunc(ii,ff)
integer ii
real*4 ff
write(6,100) ii, ff
100 format('ii=',i2,' ff=',f6.3)
return
end
I installed (without any problem) the lam mpi with the g95 fortran compiler.
My makefile is:
CC=mpiCC
FC=mpif77
each:
$(CC) -c globalMain.c
$(FC) -c -fno-second-underscore -fno-underscoring functF.f
all:
$(CC) globalMain.o functF.o -o go
As a result, "make each" runs find: no error message.
But "make all" fails:
mpiCC globalMain.o functF.o -o go
functF.o: In function `fortfunc':
functF.f:(.text+0x19): undefined reference to `_g95_get_ioparm'
functF.f:(.text+0x20): undefined reference to `_g95_filename'
functF.f:(.text+0x2b): undefined reference to `_g95_line'
functF.f:(.text+0x3e): undefined reference to `_g95_ioparm'
functF.f:(.text+0x4c): undefined reference to `_g95_ioparm'
functF.f:(.text+0x5b): undefined reference to `_g95_ioparm'
functF.f:(.text+0x6d): undefined reference to `_g95_ioparm'
functF.f:(.text+0x7d): undefined reference to `_g95_st_write'
functF.f:(.text+0x8b): undefined reference to `_g95_transfer_integer'
functF.f: (.text+0x99): undefined reference to `_g95_transfer_real'
functF.f:(.text+0x9e): undefined reference to `_g95_st_write_done'
collect2: ld returned 1 exit status
mpiCC: No such file or directory
make: *** [all] Error 1
Attached is my config.log, and below the laminfo output.
Any comment will be greatly appreciated. Thanks.
Serge Van Criekingen
==========================================
laminfo output:
LAM/MPI: 7.1.4
Prefix: /usr/local
Architecture: x86_64-unknown-linux-gnu
Configured by: crieking
Configured on: Thu Nov 15 10:44:12 CET 2007
Configure host: linux-dgie
Memory manager: ptmalloc2
C bindings: yes
C++ bindings: yes
Fortran bindings: yes
C compiler: gcc
C++ compiler: g++
Fortran compiler: g95
Fortran symbols: double_underscore
C profiling: yes
C++ profiling: yes
Fortran profiling: yes
C++ exceptions: no
Thread support: yes
ROMIO support: yes
IMPI support: no
Debug support: no
Purify clean: no
SSI boot: globus (API v1.1, Module v0.6)
SSI boot: rsh (API v1.1, Module v1.1)
SSI boot: slurm (API v1.1, Module v1.0)
SSI coll: lam_basic (API v1.1, Module v7.1)
SSI coll: shmem (API v1.1, Module v1.0)
SSI coll: smp (API v1.1, Module v1.2)
SSI rpi: crtcp (API v1.1, Module v1.1)
SSI rpi: lamd (API v1.0, Module v7.1)
SSI rpi: sysv (API v1.0, Module v7.1)
SSI rpi: tcp (API v1.0, Module v7.1)
SSI rpi: usysv (API v1.0, Module v7.1)
SSI cr: self (API v1.0, Module v1.0)
============================================
Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail
---------- Forwarded message ----------
From: Tim Prins < tprins@lam-mpi.org>
To: General LAM/MPI mailing list <lam@lam-mpi.org>
Date: Thu, 15 Nov 2007 08:23:18 -0800
Subject: Re: LAM: Mixing mpic++ and G95
Try using the fortran compiler to do the linking. I.e. replace:
$(CC) globalMain.o functF.o -o go
with
$(FC) globalMain.o functF.o -o go
Hope this helps,
Tim
Serge Van Criekingen wrote:
> Hello,
>
> I am trying to call a g95 fortran routine from a C(++) main file.
> Here are the 2 files:
> globalMain.c:
> #include <iostream>
> using namespace std;
> extern"C" {
> void fortfunc_(int *ii, float *ff);
> }
> int main( int argc , char* argv[] )
> {
> int ii=5;
> float ff=5.5;
> fortfunc_(&ii, &ff);
> return 0;
> }
>
> functF.f:
> subroutine fortfunc(ii,ff)
> integer ii
> real*4 ff
> write(6,100) ii, ff
> 100 format('ii=',i2,' ff=',f6.3)
> return
> end
>
> I installed (without any problem) the lam mpi with the g95 fortran compiler.
> My makefile is:
> CC=mpiCC
> FC=mpif77
>
> each:
> $(CC) -c globalMain.c
> $(FC) -c -fno-second-underscore -fno-underscoring functF.f
>
> all:
> $(CC) globalMain.o functF.o -o go
>
> As a result, "make each" runs find: no error message.
> But "make all" fails:
> mpiCC globalMain.o functF.o -o go
> functF.o: In function `fortfunc':
> functF.f:(.text+0x19): undefined reference to `_g95_get_ioparm'
> functF.f:(.text+0x20): undefined reference to `_g95_filename'
> functF.f:(.text+0x2b): undefined reference to `_g95_line'
> functF.f:(.text+0x3e): undefined reference to `_g95_ioparm'
> functF.f:(.text+0x4c): undefined reference to `_g95_ioparm'
> functF.f:(.text+0x5b): undefined reference to `_g95_ioparm'
> functF.f:(.text+0x6d): undefined reference to `_g95_ioparm'
> functF.f:(.text+0x7d): undefined reference to `_g95_st_write'
> functF.f:(.text+0x8b): undefined reference to `_g95_transfer_integer'
> functF.f:(.text+0x99): undefined reference to `_g95_transfer_real'
> functF.f:(.text+0x9e): undefined reference to `_g95_st_write_done'
> collect2: ld returned 1 exit status
> mpiCC: No such file or directory
> make: *** [all] Error 1
>
> Attached is my config.log, and below the laminfo output.
> Any comment will be greatly appreciated. Thanks.
> Serge Van Criekingen
> ==========================================
>
> laminfo output:
>
> LAM/MPI: 7.1.4
> Prefix: /usr/local
> Architecture: x86_64-unknown-linux-gnu
> Configured by: crieking
> Configured on: Thu Nov 15 10:44:12 CET 2007
> Configure host: linux-dgie
> Memory manager: ptmalloc2
> C bindings: yes
> C++ bindings: yes
> Fortran bindings: yes
> C compiler: gcc
> C++ compiler: g++
> Fortran compiler: g95
> Fortran symbols: double_underscore
> C profiling: yes
> C++ profiling: yes
> Fortran profiling: yes
> C++ exceptions: no
> Thread support: yes
> ROMIO support: yes
> IMPI support: no
> Debug support: no
> Purify clean: no
> SSI boot: globus (API v1.1, Module v0.6)
> SSI boot: rsh (API v1.1, Module v1.1)
> SSI boot: slurm (API v1.1, Module v1.0)
> SSI coll: lam_basic (API v1.1, Module v7.1)
> SSI coll: shmem (API v1.1, Module v1.0)
> SSI coll: smp (API v1.1, Module v1.2)
> SSI rpi: crtcp (API v1.1, Module v1.1)
> SSI rpi: lamd (API v1.0, Module v7.1)
> SSI rpi: sysv (API v1.0, Module v7.1)
> SSI rpi: tcp (API v1.0, Module v7.1)
> SSI rpi: usysv (API v1.0, Module v7.1)
> SSI cr: self (API v1.0, Module v1.0)
>
>
> ============================================
>
> ------------------------------------------------------------------------
> Ne gardez plus qu'une seule adresse mail ! Copiez vos mails
> < http://www.trueswitch.com/yahoo-fr/> vers Yahoo! Mail
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
---------- Forwarded message ----------
From: Jeff Squyres < jsquyres@cisco.com>
To: General LAM/MPI mailing list <lam@lam-mpi.org>
Date: Thu, 15 Nov 2007 08:48:38 -0800
Subject: Re: LAM: RPI fatal error
It means that the LAM communication system failed to initialize;
perhaps due to shared memory issues...?
If you're just starting with MPI, I suggest that you go with Open MPI
instead of LAM/MPI -- LAM is fairly static and not evolving anymore.
We're concentrating all of our effort on Open MPI these days (www.open-mpi.org
).
On Nov 15, 2007, at 7:37 AM, Sara Campos wrote:
> Hello,
>
> We are LAM/MPI beginners who are using parallelization to run
> molecular simulation programs.
> We have observed in some machines the following error (which seems
> to be solved when we reboot the machine):
>
> The selected RPI failed to initialize during MPI_INIT. This is a
> fatal error; I must abort.
>
> This occurred on host model24.itqb.unl.pt (n0).
> The PID of failed process was 30686 (MPI_COMM_WORLD rank: 0)
> -----------------------------------------------------------------------------
> -----------------------------------------------------------------------------
> One of the processes started by mpirun has exited with a nonzero exit
> code. This typically indicates that the process finished in error.
> If your process did not finish in error, be sure to include a "return
> 0" or "exit(0)" in your C code before exiting the application.
>
> PID 30687 failed on node n0 (193.136.181.161) with exit status 1.
>
> This error is serious to us because it kills all our parallel
> jobs that are directed to the problematic machine by the queuing
> system.
> The lam was simply installed by rpm in all machines with no
> further configuration. The commands we use are lamboot <machines>,
> mpirun C <executable> and lamhalt.
> We tried to search the manual but it is a bit too advanced for
> us. Can you explain us what the problem is and how can it be solved?
>
> Thanks in advance
>
> Sara Campos
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
--
Jeff Squyres
Cisco Systems