On Apr 3, 2007, at 10:31 AM, Van-Khiem Truong wrote:
> You are right about the number of IP address of the machine. The
> cluster has two IP addresses, I thought that the monoprocessor has
> only
> one, but I just
> asked the system administrator who told me that it has two also. He
> will erase one for the monoprocessot machine, but it is not
> possible for
> the cluster.
>
> How can make the code Lam-MPi cope with a machine that has two IP
> addresseS ?
LAM should be able to handle machines with multiple IP addresses.
It's been a long, long time since I've looked at this code in LAM,
but as long as your machine can accept the connection on arbitrary
ports on IP address 125.1.2.17, then it should work fine...?
Can you verify that this is the correct IP address for your machine,
and that arbitrary incoming socket connections can be made to that
address from the localhost? Perhaps try running NetPIPE's TCP
bandwidth tester on your localhost using that IP address...?
> Thank you and best regards,
>
> V. Khiem Truong
> Onera -France
>
>
>> On Apr 2, 2007, at 9:35 AM, Van-Khiem Truong wrote:
>>
>>> Hello Jeff Squyres,
>>>
>>> Thank you for your quick response. That is really odd! I spend
>>> some
>>> time to check about the trouble.
>>>
>>> (1) You are right about the configuration without "memory-manager";
>>>
>>> (2) There is no firewall software running;
>>>
>>> (3) Instead of using the multiprocessor machine, I installed the
>>> Lam-MPI on a single processor machine
>>> with the same processor Itanium. Then I make the lamboot call with
>>> only
>>> the Itanium station alone (with
>>> two work stations, it results into the same error):
>>> it results into the same error message as before, as you can see on
>>> the following file:
>>
>> It's actually not hboot that is failing, but the lamd (hboot is
>> mainly a wrapper around fork/exec'ing the lamd). The lamd is trying
>> to open a socket back to 125.1.2.17 port 62915 (which *should* be the
>> same as the local host).
>>
>> Do you, perchance, have multiple IP addresses on this machine? I'm
>> wondering if LAM is using the "wrong" IP address such that it can't
>> open a socket back to 125.1.2.17 properly.
>
>>> --
>> Jeff Squyres
>> Cisco Systems
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
--
Jeff Squyres
Cisco Systems
|