LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Warner Yuen (wyuen_at_[hidden])
Date: 2006-06-12 13:53:32


Hello LAM/MPIers:

I have compiled the Gromacs MD program for both PPC and Intel, I can
run each binary on their corresponding architectures, but when I try
to do an mpirun across the two different machines, the job fails
after a few seconds. Each machine can see and execute its
architecture specific binary, the LAM/MPI is the universal build of
LAM/MPI 7.1.2 from the website. Any ideas on what I might be doing
wrong? Below is the only message that I'm getting:

calculon$ mpirun -np 4 mdrun
NNODES=4, MYRANK=1, HOSTNAME=Warner-Computer.local
NNODES=4, MYRANK=3, HOSTNAME=Warner-Computer.local
NNODES=4, MYRANK=0, HOSTNAME=portal.private
NNODES=4, MYRANK=2, HOSTNAME=portal.private
NODEID=2 argc=1
NODEID=0 argc=1
NODEID=1 argc=16777216
NODEID=3 argc=16777216
MPI_Recv: message truncated (rank 2, MPI_COMM_WORLD)
Rank (2, MPI_COMM_WORLD): Call stack within LAM:
Rank (2, MPI_COMM_WORLD): - MPI_Recv()
Rank (2, MPI_COMM_WORLD): - main()
------------------------------------------------------------------------
-----
One of the processes started by mpirun has exited with a nonzero exit
code. This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.

PID 341 failed on node n0 (10.0.1.1) with exit status 1.
------------------------------------------------------------------------
-----
calculon$

Warner Yuen
Research Computing Consultant
Apple Computer
email: wyuen_at_[hidden]
Tel: 408.718.2859
Fax: 408.715.0133