LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Brian Barrett (brbarret_at_[hidden])
Date: 2007-06-08 00:49:23


On Jun 7, 2007, at 10:44 PM, chenyong wrote:

> Is it normal a machine in a cluster is assigned different rank
> number in different runs.
> In my case, there are three machines (hpc01, hpc02, hpc03) in the
> cluster. the content of the file mpd.hosts is as follows
>
> hpc01
> hpc02
> hpc03
>
> I found that in some runs, hpc01 has rank number '0' hpc02 has rank
> number '1' hpc03 has rank number '2';
> the order of rank numbers just follows the order of machine names
> listed in the file.
> However, in some other runs, hpc01 has rank number '0', hpc02 has
> rank number '2' , hpc03 has rank number '1'.
> the order does not follow the file name order.
> Is this nornal or not.

Are you using LAM/MPI or MPICH2? The mpd.boot suggests MPICH2, in
which case you would be best off asking the MPICH lists. This
behavior would be highly unusual for LAM/MPI. If you are using LAM/
MPI, are you always running lamboot from the same node? Are you
running multiple jobs at the same time?

Thanks,

Brian

-- 
   Brian Barrett
   LAM/MPI Developer
   Make today a LAM/MPI day!