On Jun 7, 2007, at 10:44 PM, chenyong wrote:
> Is it normal a machine in a cluster is assigned different rank
> number in different runs.
> In my case, there are three machines (hpc01, hpc02, hpc03) in the
> cluster. the content of the file mpd.hosts is as follows
>
> hpc01
> hpc02
> hpc03
>
> I found that in some runs, hpc01 has rank number '0' hpc02 has rank
> number '1' hpc03 has rank number '2';
> the order of rank numbers just follows the order of machine names
> listed in the file.
> However, in some other runs, hpc01 has rank number '0', hpc02 has
> rank number '2' , hpc03 has rank number '1'.
> the order does not follow the file name order.
> Is this nornal or not.
Are you using LAM/MPI or MPICH2? The mpd.boot suggests MPICH2, in
which case you would be best off asking the MPICH lists. This
behavior would be highly unusual for LAM/MPI. If you are using LAM/
MPI, are you always running lamboot from the same node? Are you
running multiple jobs at the same time?
Thanks,
Brian
--
Brian Barrett
LAM/MPI Developer
Make today a LAM/MPI day!
|