LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Josh Hursey (jjhursey_at_[hidden])
Date: 2005-04-11 13:19:36


On Apr 11, 2005, at 11:50 AM, Dima Teplinskiy wrote:

> Thanks :) Sorry i have been bussy with things, so i am finally back
> and ready to tackle this cluster config!
> I figured out why i wasnt able to create the lamhosts file, the reason
> was because following the walkthrough that i was talking about, an NFS
> directory that was shared was located in the /mnt/sputnik .... not
> /nnt/sputnik as the walkthrough said. I was able to create the
> lamhosts file in there with the list of nodes... however when i try to
> run the command lamboot -v lamhosts it gives me
>
>
> LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University
>
> -----------------------------------------------------------------------
> ------
> The boot SSI rsh module found that your local host is not in the
> hostfile "lamhosts".
>
> The local host name *must* be in the list of hosts in the hostfile.
> In other words, you must boot LAM from a node that will be part of the
> universe.
>
> - If you simply forgot to put the local host in the boot
> schema file, add it and re-run The boot SSI rsh module
> - If you are trying to boot LAM from a node that will not be
> part of the universe, you must login to on of the nodes that
> will be part of the universe (i.e., one of the nodes in the
> hostfiles), and re-run The boot SSI rsh module
>
> Although the local host name is usually the first in the list to avoid
> I/O ambiguities, it can actually appear anywhere in the list.
>
> I have tried running that in the mnt directory and in the home
> directory, however it keeps giving me the same error... not sure what
> this, but i did add the nodes to the lamhosts file in order, and i
> installed LAM on the node following the same process i used on the
> head computer.

Can you confirm that the host that you ran 'lamboot -v lamhosts' on
(which can be accessed via the 'hostname' command) is included in the
'lamhosts' file?

LAM/MPI requires that the node that you use to start the LAM daemons is
also in the boot schema. If this doesn't solve the problem then you may
want to take a look at the debug information from lamboot which can be
generated by
  shell$ lamboot -v -d lamhosts

> You have mentioned something about PATH i am not sure
> what that is or how i can set it up :/ Thanks for your help!!! I
> really apreciate it
>

The PATH environment variable tends to be very important when setting
up LAM/MPI in a Unix environment. Take a look at section 4.1.1 of the
LAM/MPI Users manual for information about setting this up.
http://www.lam-mpi.org/using/docs/

Hope that helps,
Josh

>
>
> On Tue, 5 Apr 2005 19:04:19 -0500, Josh Hursey <jjhursey_at_[hidden]>
> wrote:
>>
>> On Apr 5, 2005, at 6:32 PM, Dima Teplinskiy wrote:
>>
>>> Hey, wow thank you so much, the ./configure CXX=g++ CC=gcc FC=g77
>>> F77=g77 seemed to do the trick, i was able to install it just fine.
>>
>> Awesome! :)
>>
>>>
>>> Now i have another question :) on the node i simply ran the .rpm
>>> file that i got off the site, is that enough for the node to be able
>>> to compile or do i have to follow the same steps that i did for
>>> installing on the head node?
>>
>> To be absolutely certain of the installation, I would follow the same
>> steps for installation as you did with the head node. That being said,
>> as long as the versions of LAM/MPI are the same across the cluster,
>> and
>> configured in the same way, then LAM should run properly. You will
>> want
>> to make sure that your $PATH is setup properly on each node, so that
>> you are using the proper installation of LAM/MPI.
>>
>>>
>>> Also one more thing, i am new to this stuff, and i got the ssh to
>>> work
>>> to be able to connect to the node without prompting for a password,
>>> however I am not sure if there is anything else i need, i have been
>>> following this walk through
>>> http://tldp.org/HOWTO/Beowulf-HOWTO/x70.html and when i get to
>>> http://tldp.org/HOWTO/Beowulf-HOWTO/x202.html i dont know how to add
>>> the nodes to lamhosts, when i follow the example that says "cat >
>>> /nnt/wolf/lamhosts" i get an error saying bash:
>>> /nnt/sputnik/lamhosts:
>>> No such file or directory.
>>
>> Humm... seems like the /nnt/sputnik/ directory may be the problem. Is
>> it a network mounted directory?
>>
>> -----
>> $ cat > lamhosts
>> node1
>> node2
>> node3
>> <control D>
>> -----
>> Should create the lamhosts file for you in the current directory
>> regardless of its existence. Try doing that, then copying the file to
>> where you want it (e.g. /nnt/sputnik/lamhosts). Note also that you can
>> create the 'lamhosts' file with any editor, if that is easier. A 'man
>> bhost' will give you some more information about this file.
>>
>>>
>>> Also following the walkthrough i cant get the home directory of the
>>> headnode to mount on the slavenodes, does that make a difference?
>>> should there be a shared folder between them for all of this to work
>>> right?
>>
>> It can make things easier, but a NFS-style mount like that is not
>> required for LAM to run properly. It just means that you must either
>> distribute the binaries to each node individually *or* use the '-s'
>> option to mpirun (see man mpirun for more details on how to use that
>> option). The latter is much easier than the former, especially on a
>> homogeneous cluster.
>>
>>> I am sorry for asking so many questions, but you seem to be
>>> the only person that was able to even come close in helping me do
>>> anything in linux :D Thanks a lot I really apreciate it!!
>>
>> No worries about the questions, that is the reason for the list. Keep
>> the LAM questions coming! :)
>> There is a large community of folks who monitor this list that have
>> asked themselves the same questions at some point whom can provide
>> experienced answers as well.
>>
>> Cheers,
>> Josh
>>
>>
>>>
>>>
>>> On Tue, 5 Apr 2005 09:18:08 -0500, Josh Hursey <jjhursey_at_[hidden]>
>>> wrote:
>>>>
>>>> On Apr 4, 2005, at 4:08 PM, Dima Teplinskiy wrote:
>>>>
>>>>> Hello, i am not sure if i should be writting to this address,
>>>>
>>>> Yep, you are in the right place :)
>>>>
>>>>> but i am
>>>>> running Redhat 9.0 and when i try to follow the instuctions on
>>>>> configuring Lam mpi after unzipping it, i get an error saying that
>>>>>
>>>>> configure: WARNING: *** Your C++ compiler does not support the bool
>>>>> data type.
>>>>> configure: WARNING: *** LAM requires a C++ compiler with support
>>>>> for
>>>>> the bool
>>>>> configure: WARNING: *** data type.
>>>>>
>>>>
>>>> The config.log should provide the exact reason for the failure
>>>> [however
>>>> I was unable to open the attached file :( ].
>>>> This warning may not, necessarily, be caused by bool not being
>>>> supported, but it may be that the C++ libraries are not installed
>>>> correctly, or something that is causing the 'bool' type from not
>>>> being
>>>> found by the compiler. Generally if you can compile and run the
>>>> sample
>>>> code below with your specified C++ compiler, everything should be
>>>> fine:
>>>> --------------------
>>>> #include <iostream>
>>>> using namespace std;
>>>>
>>>> int main(int argc, char**argv) {
>>>> bool foo = true;
>>>> cout << "Hello, world! " << foo << endl;
>>>> return 0;
>>>> }
>>>> ----------------------
>>>> Should give the following output:
>>>> Hello, world! 1
>>>>
>>>> You may try to explicitly specify your C++ compiler for the
>>>> configure
>>>> script via that CXX argument. Somthing like:
>>>> ./configure CXX=g++ CC=gcc FC=g77 F77=g77
>>>>
>>>>>
>>>>> i will attach the full output of the config with the email, but if
>>>>> you
>>>>> could, could you please tell me how i can get it to work properly,
>>>>> because when i try to type "make" after all of this, it says
>>>>>
>>>>
>>>> I was unable to read the config output. Could you [b | g]zip the
>>>> config.log and either send it to the list again or directly to me
>>>> (to
>>>> save everyone from receiving a monster file in there inbox :)
>>>>
>>>> Josh
>>>>
>>>>> make: *** No targets specified and no makefile found. Stop.
>>>>>
>>>>> So if you could help me out, i would really apreciate it.
>>>>> <config output.sxw>_______________________________________________
>>>>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>>> ----
>>>> Josh Hursey
>>>> jjhursey_at_[hidden]
>>>> http://www.lam-mpi.org/
>>>>
>>>>
>>>>
>> ----
>> Josh Hursey
>> jjhursey_at_[hidden]
>> http://www.lam-mpi.org/
>>
>>
>>

----
Josh Hursey
jjhursey_at_[hidden]
http://www.lam-mpi.org/