LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: daniel.egloff_at_[hidden]
Date: 2004-11-03 09:17:09


Dear lam/mpi list

I use LAM/MPI 7.0.6-4 (url for source see below) which I recompiled
from source, because the Debian package strips symbol information and
therefore does not work with the TotalView debugger. (Would be a good
idea to mention that to the Debian package builders too)

I observe the following odd behaviour (which I did not have with the
7.0.6-2 Debian binary package from the Debian package archive). FYI:
ring is the ring example application from the lam examples:

******************************************

LAM 7.0.6/MPI 2 C++/ROMIO - Indiana University

e3050_at_platosrv:~/workspace/lam-examples/examples/main/ring$ mpirun
n0,0,0,0 ring
-----------------------------------------------------------------------------
The selected RPI failed to initialize during MPI_INIT. This is a
fatal error; I must abort.

This occurred on host
platosrv-----------------------------------------------------------------------------
 (n0The selected RPI failed to initialize during MPI_INIT. This is a
).
fatal error; I must abort.
The PID of failed process was 1076
 (MPI_COMM_WORLD rank: 0)
-----------------------------------------------------------------------------
This occurred on host platosrv (n0).
The PID of failed process was 1077 (MPI_COMM_WORLD rank: 1)
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
One of the processes started by mpirun has exited with a nonzero exit
code. This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.

PID 1078 failed on node n0 (147.50.18.157) with exit status 1.
-----------------------------------------------------------------------------

********************************************

If I use only 3 processes on the same node, i.e, mpirun n0,0,0 ring
things work.
I had applications which even only runned with 2 processes on the
"root node".

I also have to do a lamhalt / lamboot hostfile sequence to do mpirun
again without errors once I got errors like above.

I somewhere stumbeled over a not of Jeff that such errors are going to
be fixed in lam 7.1.x.
Do I need to switch to version 7.1.x.

A quick answer will be very much appreciated.

Binary source which I recompiled:
http://ftp.debian.org/debian/pool/main/l/lam/lam_7.0.6-4.dsc
http://ftp.debian.org/debian/pool/main/l/lam/lam_7.0.6.orig.tar.gz
http://ftp.debian.org/debian/pool/main/l/lam/lam_7.0.6-4.diff.gz

Best regards,

Daniel Egloff
Zürcher Kantonalbank, VFK
Lagerstrasse 47, 8004 Zürich
Tel. +41 (0) 1 292 45 33, Fax +41 (0) 1 292 45 93
Briefadresse: Postfach, 8010 Zürich, http://www.zkb.ch
___________________________________________________________________

Disclaimer:

Diese Mitteilung ist nur fuer die Empfaengerin / den Empfaenger
bestimmt.

Fuer den Fall, dass sie von nichtberechtigten Personen empfangen wird,
bitten wir diese hoeflich, die Mitteilung an die ZKB zurueckzusenden
und anschliessend die Mitteilung mit allen Anhaengen sowie allfaellige
Kopien zu vernichten bzw. zu loeschen. Der Gebrauch der Information
ist verboten.

This message is intended only for the named recipient and may contain
confidential or privileged information.

If you have received it in error, please advise the sender by return
e-mail and delete this message and any attachments. Any unauthorised
use or dissemination of this information is strictly prohibited.