On Thu, 22 May 2003, Yu Chen wrote:
> But when I run the lam-tests-6.5.9, it gives out:
> ........
> *** Testing -lamd mode ***
> make[2]: Entering directory `/lamtests-6.5.9/ccl'
> mpirun -x TEST -np 2 -s h -lamd -O /lamtests-6.5.9/ccl/allgather
> mpirun: cannot start ../reporting/collector on n0 (o): No such file or
> directory
> mpirun -x TEST -np 2 -s h -lamd -O /lamtests-6.5.9/ccl/allreduce
> mpirun: cannot start ../reporting/collector on n0 (o): No such file or
> directory
> mpirun -x TEST -np 2 -s h -lamd -O /lamtests-6.5.9/ccl/alltoall
> mpirun: cannot start ../reporting/collector on n0 (o): No such file or
> directory
> .......
That's quite odd, and clearly shouldn't be happening.
> While the file is actually there, with the right permission, and on NFS.
> I am really lost here, I would highly appreciate if anyone could give me
> some advices.
Note that the tests are actually running to completion (apparently
successfully) -- the collector is simply a program that runs to collect
any errors that may have occurred on remote nodes. I'm guessing that this
is simply a minor error in the testing harness; I'm sure that it does not
indicate that your LAM/MPI installation is broken.
But I don't know why this is happening offhand, so let me ask a few
questions:
- are you running the test suite on the same node that you lambooted from?
- is /lamtests-6.5.9 really NFS exported to, and mounted on all nodes?
- does /lamtests-6.5.9/reporting/collector exist on the node that you ran
the test suite on, and have (at least) permissions 555? (I know you
mentioned this, but I want to nail down what the "right permissions" are)
- can you manually run the collector? e.g.,
cd /lamtests-6.5.9/reporting
mpirun -s h N collector
cd ../ccl
mpirun -ssh N ../reporting/collector
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|