LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Brian Barrett (brbarret_at_[hidden])
Date: 2005-07-19 17:20:58


On Jul 19, 2005, at 1:57 PM, Yeliang Zhang wrote:

> I have the following errors when I tried to run an executable. The
> code
> is working fine before but suddenly it stops working with RPI problem.
> Do you know how to solve this problem?
>
> [zhang_at_minime64 testing]$ mpirun n0-7 -np 8 test.x
> ----------------------------------------------------------------------
> -------
> It seems that [at least] one of the processes that was started with
> mpirun chose a different RPI than its peers. For example, at least
> the following two processes mismatched in their RPI selections:
>
> MPI_COMM_WORLD rank 4: usysv (v7.1.0)
> MPI_COMM_WORLD rank 1: gm (v1.2.0)
>
> All MPI processes must choose the same RPI module and version when
> they start. Check your SSI settings and/or the local environment
> variables on each node.
> ----------------------------------------------------------------------
> -------

This means that for some reason two processes in your cluster found
different "best" transports. One (rank 1) wanted to use Myrinet/GM
and another (rank 4) wanted to use shared memory (usysv). If
everyone should be able to use gm, that indicates that gm might not
be properly setup on one of your nodes. If some of your nodes have
Myrinet/GM and others don't, you may need to tell LAM to use the
usysv RPI explicitly. You can do this on the mpirun command line:

   % mpirun <normal options> -ssi rpi usysv <application>

Hope this helps,

Brian

-- 
   Brian Barrett
   LAM/MPI developer and all around nice guy
   Have a LAM/MPI day: http://www.lam-mpi.org/