LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: rob fiske (rfiske__at_[hidden])
Date: 2006-02-08 12:27:23


I am having trouble getting the program NWChem to run in parallel (the code
works fine when using 1 node, two processors with lam, but fails if trying
to use another machine) however, I am able to run the program CPMD in
parallel with multiple nodes without problem, the NWChem mailing list
referred me here.

The error message I'm getting seems to indicate conflicting versions of lam,
but both machines I have tried only have one version installed (and laminfo
confirms they are the same version) and the fact that other programs can be
run in parallel makes me wonder what could be causing this conflict.

Here is the error message:

==============================================
Palladium:~/tests/QM/BH4_N fiske$ mpirun C /usr/local/NWChem/bin/nwchem
tests/QM/BH4_N/test.nw
-----------------------------------------------------------------------------
It seems that [at least] one of the processes that was started with
mpirun chose a different RPI than its peers. For example, at least
the following two processes mismatched in their RPI selections:

    MPI_COMM_WORLD rank 0: tcp (v7.0.0)
    MPI_COMM_WORLD rank 2: usysv (v7.1.0)

All MPI processes must choose the same RPI module and version when
they start. Check your SSI settings and/or the local environment
variables on each node.
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
The selected RPI failed to initialize during MPI_INIT. This is a
fatal error; I must abort.

This occurred on host Cobalt (n1).
The PID of failed process was 15412 (MPI_COMM_WORLD rank: 2)
==============================================

Both machines have LAM-7.0.6 installed, and both are MAC OSX 10.3.9 for
their OS and their CPUs are G4's

Has anyone encountered a problem such as this before (I have tried giving
the -ssi option to mpirun as found on this list)?

Thank you for your time and any assistance

Robert Fiske