This is quite odd -- I can't think of a reason why this would happen.
Do you get corefiles from your MPI processes, perchance?
Some other suggestions:
- can you lamexec multiple non-MPI processes on a single node? e.g.,
"lamexec n0 n0 uptime"
- can you mpirun a simple hello world MPI process on a single node?
e.g., "mpirun n0 n0 hello"
- if you have ssh running in the cluster, can you try using the rsh/ssh
boot module instead of bproc? I *doubt* that bproc is the issue, but
you never know -- i.e., if you use a different module and the same
results happen, then it's *probably* not the boot module that's at
fault.
On Sep 15, 2004, at 12:56 PM, Kaveh Moallemi - CSCI/P2003 wrote:
> Hello,
>
> I've installed lam-7.0.6 integrated with bproc-3.2.6 on a small 4-node
> cluster. I can execute mpi programs on the cluster and they run just
> fine
> .... however, if I try to run more than one process per node (since the
> nodes are dual processors) I get the following error message:
>
> -----------------------------------------------------------------------
> ------
> It seems that [at least] one of the processes that was started with
> mpirun did not invoke MPI_INIT before quitting (it is possible that
> more than one process did not invoke MPI_INIT -- mpirun was only
> notified of the first one, which was on node n0).
>
> mpirun can *only* be used with MPI programs (i.e., programs that
> invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program
> to run non-MPI programs over the lambooted nodes.
> -----------------------------------------------------------------------
> ------
>
> I should note that the 3 of the 4 nodes in the cluster are diskless and
> thus have a minimalistic root file system (an 8-Meg ramdisk). I
> suspect
> that I'm doing something wrong in my setup .... does anyone have any
> ideas? Why can I run a single mpi process on each node but not 2?
>
> Any help would be greatly appreciated.
>
>
> Kaveh
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|