Brian --
So you are the PBS guy. Hints on this one!
-Vishal
On Mon, 19 Apr 2004, Drake, Richard R wrote:
# Hi,
#
# I could not find the definitive answer to this one by scanning the mail
# archives:
#
# Using v 7.0.2 of LAM/MPI on a Linux rack running PBS, I cannot set the
# session prefix using the LAM_MPI_SESSION_PREFIX (sp) environment variable.
# My workaround is to unsetenv the PBS_JOBID environment variable first before
# running lamboot.
#
# Is this the expected and desired behavior?
#
# I can explain my use case and reasoning if needed.
#
# Thanks,
#
# -rich
#
#
# -----Original Message-----
# From: Vishal Sahay [mailto:vsahay_at_[hidden]]
# Sent: Monday, April 19, 2004 1:57 PM
# To: General LAM/MPI mailing list
# Subject: RE: LAM: Help: MPI_Irecv and pthreads.
#
#
# Hi --
#
#
# # In gmx381_new_iter9, the error message is
# # Node 13: error opening file /home/jr241/gmx381_new_iter9/local/dbout.0013
# #
# # Node 13: error opening file /home/jr241/gmx381_new_iter9/local/d3plot06
# #
# # Node 12: error opening file /home/jr241/gmx381_new_iter9/local/dbout.0012
# #
# # Node 4: error opening file /home/jr241/gmx381_new_iter9/local/dbout.0004
# #
#
# You might want to check if read access to these files / dir is being
# restricted in some way to you.
#
# # MPI_Recv: process in local group is dead (rank 7, MPI_COMM_WORLD)
# # MPI_Recv: process in local group is dead (rank 6, MPI_COMM_WORLD)
# # MPI_Recv: process in local group is dead (rank 11, MPI_COMM_WORLD)
# # MPI_Recv: process in local group is dead (rank 10, MPI_COMM_WORLD)
# #
# # Rank (7, MPI_COMM_WORLD): Call stack within LAM:
# # Rank (7, MPI_COMM_WORLD): - MPI_Recv()
# # Rank (7, MPI_COMM_WORLD): - MPI_Bcast()
# # Rank (7, MPI_COMM_WORLD): - MPI_Allreduce()
# # Rank (7, MPI_COMM_WORLD): - main()
# #
# # Rank (6, MPI_COMM_WORLD): Call stack within LAM:
# # Rank (6, MPI_COMM_WORLD): - MPI_Recv()
# # Rank (6, MPI_COMM_WORLD): - MPI_Bcast()
# # Rank (6, MPI_COMM_WORLD): - MPI_Allreduce()
# # Rank (6, MPI_COMM_WORLD): - main()
# # Rank (10, MPI_COMM_WORLD): Call stack within LAM:
#
#
# This happens when some MPI process in your parallel job failed and
# crashed, leaving the other processes who try to contact them throwing
# these errors. You might want to see your code and figure out why these
# processes crash. Some reasons may be a memory badness, or some invalid
# operation causing the premature termination.
#
# -Vishal
# _______________________________________________
# This list is archived at http://www.lam-mpi.org/MailArchives/lam/
#
# _______________________________________________
# This list is archived at http://www.lam-mpi.org/MailArchives/lam/
#
|