On Mon, 26 May 2003, Ndong-Nna Guitry-Evrard wrote:
> And we want to know, if we can use some tools and libraries of
> MPICH-1.2.5 with LAM-6.5.7.
Yes, but you will need to recompile them for LAM.
> It will very helpfull, if you can tell us how launch the lam daemons on
> a linux cluster. We have followed all the procedures of the
> installation's guide. Lam is installed in all the nodes. But when we
> tried to execute an example of mpi program, the task failed.
> This is the constant message, we receive after failure :
> ----------------------------------------------------------------------------
> It seems that there is no lamd running on this host, which indicates
> [snipped]
> ----------------------------------------------------------------------------
>
> This is the invokation,we made before :
>
> rsh $lastNode lamboot -s $lamconfile
> rsh $lastNode cd ~/test1/io
> /usr/local/lam-mpi/bin/mpirun -np $nnodes async -fname razoir
> rsh $lastNode lamhalt
> rm -f $lamconfile
There's a few problems with your command sequence listed there:
- each rsh is distinct, and bears no connection to the one prior to it
- for example, when you "rsh ... cd ...", the results of the cd are
lost as soon as rsh completes
- the mpirun command is not executed on the node that you ran lamboot
on (and evidently it is not a node that is listed in $lamconfile);
this is specifically why you are getting the error -- LAM was not
started on the node that you are running this script on
- it would be better to write all of these commands into a single
script, and then run *that* script on $lastNode. For example, have
a script named "mpi_stuff":
lamboot -s $lamconfile
cd ~/test1/io
/usr/local/lam-mpi/bin/mpirun -np $nnodes async -fname razoir
lamhalt
rm -f $lamconfile
and then run:
rsh $lastNode mpi_stuff
That should work properly for you.
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|