LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Vishal Sahay (vsahay_at_[hidden])
Date: 2004-08-04 12:00:00


Hi --

Can you try out the subversion (repository) version of LAM or the LAM
nightly tarball from http://www.lam-mpi.org/svn/ and see if you still get
the same problem. There were a few fixes that went in there.

Also you would be able to see usage of Infiniband with LAM in the LAM docs
under <your/LAM/source/dir>/doc.

When you are using -ssi rpi ib, even for the four lamnodes on the same
machine (4 CPU), Infiniband will be used.

Just a note, when you have your own private installation of LAM, you do
not need to use "-I/xxx" explicitly. It will be embedded in the
mpicc/mpiCC wrapper compiler. You can verify that using the "-showme"
option to the wrapper compilers -- eg: mpicc -showme

Thanks!
-Vishal

On Tue, 3 Aug 2004, Karl Hahn wrote:

# Hi again,
#
# I have problems using Infiniband. This is my first
# encounter with Infiniband, so I only know that it
# is a kind of 'fast network' :-)
#
# I have compiled a small program which I can run
# with tcp and lamd rpi. With rpi ib I get errors.
#
# 1) There are four lamnodes on a quad CPU machine.
# So Infiniband is not actually used(?). But is should
# be possible to run the program. Is this right?
#
# 2) Can I find detailed error messages of LAM
# anywhere?
#
# 3) Can someone recommend a kind of 'Infiniband
# tutorial for LAM'? The cluster is connected with
# Ethernet and Infiniband. Which one is used? How
# can I control this?
#
# 4) It _could_ be possible that I have mixed
# two versions of LAM (installed is a 6.5.4(!),
# I use a 7.1.beta). I tried to avoid problems
# using a new $PATH, $LD_LIBRARY_PATH, and
# -I/xxx in the Makefile. But I am not sure ...
#
# Sorry for the beginners' questions!
# Charlie
#
#
# And here is the output of my program:
#
# $ lamboot
#
# LAM 7.1b13/MPI 2 C++/ROMIO - Indiana University
#
# $ lamnodes
# n0 node:4:origin,this_node
#
# $ mpirun -c 4 -ssi rpi tcp ./stresstest
#
# Try to initialize cluster...
# running 4 processes ...
# checking the LAM nodes ...
# n0 node:4:origin,this_node
#
# Broadcasting message with size 100MB to 3 slaves.
# Broadcast done.
# Gathering data.
# Gather done.
#
# $ mpirun -c 4 -ssi rpi ib ./stresstest
# -----------------------------------------------------------------------------
# An erroneous completion was generated while polling for the Infiniband
# completion queue
#
# The exact error string returned by Infiniband API is as follows:
#
# -----------------------------------------------------------------------------
# An erroneous completion was generated while polling for the Infiniband
# completion queue
#
# The exact error string returned by Infiniband API is as follows:
#
# "Operation Completed Successfully"
# -----------------------------------------------------------------------------
# -----------------------------------------------------------------------------
# An erroneous completion was generated while polling for the Infiniband
# completion queue
#
# The exact error string returned by Infiniband API is as follows:
#
# "Operation Completed Successfully"
# -----------------------------------------------------------------------------
# "Operation Completed Successfully"
# -----------------------------------------------------------------------------
# MPI_Recv: internal MPI error: Invalid argument (rank 0, MPI_COMM_WORLD)
# Rank (0, MPI_COMM_WORLD): Call stack within LAM:
# Rank (0, MPI_COMM_WORLD): - MPI_Recv()
# Rank (0, MPI_COMM_WORLD): - MPI_Init()
# Rank (0, MPI_COMM_WORLD): - main()
# MPI_Send: internal MPI error: Invalid argument (rank 3, MPI_COMM_WORLD)
# Rank (3, MPI_COMM_WORLD): Call stack within LAM:
# Rank (3, MPI_COMM_WORLD): - MPI_Send()
# Rank (3, MPI_COMM_WORLD): - MPI_Init()
# Rank (3, MPI_COMM_WORLD): - main()
# -----------------------------------------------------------------------------
# An erroneous completion was generated while polling for the Infiniband
# completion queue
#
# The exact error string returned by Infiniband API is as follows:
#
# "Operation Completed Successfully"
# -----------------------------------------------------------------------------
# MPI_Send: internal MPI error: Invalid argument (rank 1, MPI_COMM_WORLD)
# Rank (1, MPI_COMM_WORLD): Call stack within LAM:
# Rank (1, MPI_COMM_WORLD): - MPI_Send()
# Rank (1, MPI_COMM_WORLD): - MPI_Init()
# Rank (1, MPI_COMM_WORLD): - main()
# MPI_Send: internal MPI error: Invalid argument (rank 2, MPI_COMM_WORLD)
# Rank (2, MPI_COMM_WORLD): Call stack within LAM:
# Rank (2, MPI_COMM_WORLD): - MPI_Send()
# Rank (2, MPI_COMM_WORLD): - MPI_Init()
# Rank (2, MPI_COMM_WORLD): - main()
# -----------------------------------------------------------------------------
# One of the processes started by mpirun has exited with a nonzero exit
# code. This typically indicates that the process finished in error.
# If your process did not finish in error, be sure to include a "return
# 0" or "exit(0)" in your C code before exiting the application.
#
# PID 18491 failed on node n0 (x.x.x.x) with exit status 1.
# -----------------------------------------------------------------------------
#
#
# _______________________________________________
# This list is archived at http://www.lam-mpi.org/MailArchives/lam/
#