A few things:
- You probably don't need to edit lam-bhost.def file -- it's really
only intended as a default. Instead, you can write your own hostfile
and use "lamboot <hostfile>", where <hostfile> is the filename of your
hostfile.
- Can you be specific on what errors occur when you try to lamboot the
head node + compute nodes?
- Check with the administrators of your cluster and see how they intend
it to be used. It may well be that they don't want you to use lamboot
on the head node, and instead *only* run MPI applications in PBS jobs.
- To lamboot in a PBS job, you have two main options: use "lamboot
$PBS_NODEFILE" ($PBS_NODEFILE is an environment variable that is
defined inside PBS jobs, and will contain a list of the nodes that have
been allocated to your PBS job), or you can build LAM with native PBS
support and simply use "lamboot" ($PBS_NODEFILE is unnecessary here --
LAM will directly obtain the list of nodes allocated to you from PBS).
Consult the LAM/MPI Installation Guide for information on how to build
LAM with native PBS support.
Hope that helps.
On Oct 12, 2004, at 11:54 PM, zli_at_[hidden] wrote:
> Dear my folks in Lam-mpi field:
> My name is Zhiyi Li, a research assistant in Virginia
> Bioinformatics Institute in Virginia Tech. I try to install lam-mpi
> and have some problems. I wish I can find help here.
> The objective is that I want to install lam-mpi in IBM Linux
> Cluster system. This system is adminstrated by OpenPBS operating
> system and parallel programming job can only submitted to a batch
> queue by PBS scripts.
> This system include one Linux server and 96 Linux Cluster nodes.
> I installed lam-mpi in my directory for example /home/zli/lammpi
> and setup appropriate path, then I can execute lamboot command to
> activate lam-mpi interface environment.
> The problem is that lamboot only boot my server linux node,
> however, it does not boot other linux cluster node. I try to modify
> lam-bhost.def file to add other linux host cluster node name in it
> to tell lam-mpi also boot other nodes, the system failed and
> responds it can not access my Linux Cluster node. According to my
> thinking is that PBS system is a batch system and mpi job includes
> lamboot command only can submit to system by PBS scripts, only
> lamboot command in server side can not access directly with other
> Linux Cluster nodes. Is it true? Are there any special software or
> setup system to let lamboot command boot all Linux Clusters lam-mpi
> environments?
> Are there any path to go around wall (PBS scripts) and let
> lamboot
> to boot Linux ClusterI am very eager and wait to find solutions.
>
> Sincerely
> Zhiyi Li
> Virginia Bioinformatics Institute
> Virginia Tech
> (404)-(375)-(4141)
>
>
>
>
>
>
>
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|