LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Krzysztof Bandurski (kb_at_[hidden])
Date: 2008-05-19 14:29:43


Hi All,

I used lam before, but I upgraded my system and installed fedora 8 from
scratch. I have a dual-core athlon 64 on an nforce chipset. I wanted to
install some mpi environment quickly to test my parallel programs on my
machina at home before submitting them to the cluster that I use, so I
just "yummed" lam to my machine. Lamboot seems to work fine, but I have
a strange problem with mpirun/mpiexec.

When I run a program using mpirun, e.g. like this:

mpirun -np 4 testpopmpi_release <and then follow the command line
arguments...>

I do get 4 processes running, but each of them sees only itself in
MPI_COMM_WORLD. When I run it with --display-map, I get something like
this at the beginning of the output:

[kris_at_nothing nnworkshop]$ mpirun --display-map -np 4 testpopmpi_release
-packley -d300 -T0f -v1 -Dcgpr -P256 -Mdesa-best2bin
[nothing:05733] Map for job: 1 Generated by mapping mode: byslot
        Starting vpid: 0 Vpid range: 4 Num app_contexts: 1
        Data for app_context: index 0 app: testpopmpi_release
                Num procs: 4
                Argv[0]: testpopmpi_release
                Argv[1]: -packley
                Argv[2]: -d300
                Argv[3]: -T0f
                Argv[4]: -v1
                Argv[5]: -Dcgpr
                Argv[6]: -P256
                Argv[7]: -Mdesa-best2bin
                Env[0]: OMPI_MCA_rmaps_base_display_map=1
                Env[1]:
OMPI_MCA_orte_precondition_transports=444a2d3c430e64ba-6534b32b337c12e7
                Env[2]: OMPI_MCA_rds=proxy
                Env[3]: OMPI_MCA_ras=proxy
                Env[4]: OMPI_MCA_rmaps=proxy
                Env[5]: OMPI_MCA_pls=proxy
                Env[6]: OMPI_MCA_rmgr=proxy
                Working dir: /home/kris/nnworkshop (user: 0)
                Num maps: 0
        Num elements in nodes list: 1
        Mapped node:
                Cell: 0 Nodename: nothing Launch id: -1
Username: NULL
                Daemon name:
                        Data type: ORTE_PROCESS_NAME Data Value: NULL
                Oversubscribed: True Num elements in procs list: 4
                Mapped proc:
                        Proc Name:
                        Data type: ORTE_PROCESS_NAME Data Value: [0,1,0]
                        Proc Rank: 0 Proc PID: 0 App_context index: 0

                Mapped proc:
                        Proc Name:
                        Data type: ORTE_PROCESS_NAME Data Value: [0,1,1]
                        Proc Rank: 1 Proc PID: 0 App_context index: 0

                Mapped proc:
                        Proc Name:
                        Data type: ORTE_PROCESS_NAME Data Value: [0,1,2]
                        Proc Rank: 2 Proc PID: 0 App_context index: 0

                Mapped proc:
                        Proc Name:
                        Data type: ORTE_PROCESS_NAME Data Value: [0,1,3]
                        Proc Rank: 3 Proc PID: 0 App_context index: 0

and then follows the output of my program. As you can see, lam thinks
that all the processes are in the same communicator (they all have
different ranks), but when I call MPI_Comm_rank and MPI_Comm_size in my
program, I always get rank == 0 and size == 1in each single process -
needless to say, the processes can't communicate and I just have 4
independent copies of my program running (and printin exactly the same
output on the terminal....). Does anyone have any idea what might be
going on? This is really driving me nuts, I will appreciate any hints.

best regards,

kris.