On Mon, 21 Jan 2002, Kieun Kim wrote:
> I 've got error when I do mpirun. My path for mpicc and mpirun is
> correct.
>
> The following message comes out.
>
> Can't open the input file: /pliershome/kieun3/MPI/rand_data.txt
Does this file exist on all the nodes that you're trying to run on? This
is the most common reason that these kinds of error messages appear.
> -----------------------------------------------------------------------------
> One of the processes started by mpirun has exited with a nonzero exit
> code. This typically indicates that the process finished in error.
> If your process did not finish in error, be sure to include a "return
> 0" or "exit(0)" in your C code before exiting the application.
>
> PID 21724 failed on node n0 with exit status 1.
> -----------------------------------------------------------------------------
This is a derivative of the first error -- your program bailed on at least
one node, so LAM decided to kill your entire application.
> So, I thought there is no daemon on each node. After that, I want to
> compile and run the LAM test suite using downloaded lamtests-6.5.6.
>
> When I run "make", I 've got an error.
>
> "lamtest_errors.174911-01212002" file created.
>
> (none):(none):mpirun/spawn_appschema/-lamd:0
> [snipped]
These types of spawn errors typically happen when you don't have a common
filesystem and the executables are not available on all nodes. This tends
to support my guess above for what is happening wrong with your program.
> <--------------------------------------------------------------------------->
> lamd:0:file_status_get_count.c:97
> ERROR: MPI_Get_count returned the incorrect value.
> Was expecing: -1073744068, MPI_Get_count returned -1073744072
These one-sided errors were fixed in a recent version of LAM -- you may
wish to update to 6.5.6.
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/
|