So I corrected the configure command and now I checked laminfo and I do have
rsh boot module.
Thanks for pointing that out.
But I'm still not able to lamboot, here is the output of laminfo
LAM/MPI: 7.0.6
Prefix: /scratch/MYLAM/
Architecture: i686-pc-linux-gnu
Configured by: talmas
Configured on: Wed Dec 22 13:05:28 EST 2004
Configure host: ccbm-cn02.ccbm.jhu.edu
C bindings: yes
C++ bindings: yes
Fortran bindings: yes
C profiling: yes
C++ profiling: yes
Fortran profiling: yes
ROMIO support: yes
IMPI support: no
Debug support: no
Purify clean: no
SSI boot: globus (Module v0.5)
SSI boot: rsh (Module v1.0)
SSI coll: lam_basic (Module v7.0)
SSI coll: smp (Module v1.0)
SSI rpi: crtcp (Module v1.0.1)
SSI rpi: lamd (Module v7.0)
SSI rpi: sysv (Module v7.0)
SSI rpi: tcp (Module v7.0)
SSI rpi: usysv (Module v7.0)
And the lamboot output is:
-------------------------------------------------------------
LAM 7.0.6/MPI 2 C++/ROMIO - Indiana University
n-1<21973> ssi:boot:base:linear: booting n0 (ccbm-cn02)
n-1<21973> ssi:boot:base:linear: booting n1 (ccbm-cn03)
ERROR: LAM/MPI unexpectedly received the following on stderr:
bash: line 1: hboot: command not found
----------------------------------------------------------------------------
-
LAM failed to execute a LAM binary on the remote node "ccbm-cn03".
Since LAM was already able to determine your remote shell as "hboot",
it is probable that this is not an authentication problem.
LAM tried to use the remote agent command "/usr/bin/ssh"
to invoke the following command:
/usr/bin/ssh -x ccbm-cn03 -n hboot -t -c lam-conf.lamd -v -s -I "-H
192.168.137.102 -P 33678 -n 1 -o 0"
This can indicate several things. You should check the following:
- The LAM binaries are in your $PATH
- You can run the LAM binaries
- The $PATH variable is set properly before your
.cshrc/.profile exits
Try to invoke the command listed above manually at a Unix prompt.
You will need to configure your local setup such that you will *not*
be prompted for a password to invoke this command on the remote node.
No output should be printed from the remote node before the output of
the command is displayed.
When you can get this command to execute successfully by hand, LAM
will probably be able to function properly.
----------------------------------------------------------------------------
-
n-1<21973> ssi:boot:base:linear: Failed to boot n1 (ccbm-cn03)
n-1<21973> ssi:boot:base:linear: aborted!
----------------------------------------------------------------------------
-
lamboot encountered some error (see above) during the boot process,
and will now attempt to kill all nodes that it was previously able to
boot (if any).
Please wait for LAM to finish; if you interrupt this process, you may
have LAM daemons still running on remote nodes.
----------------------------------------------------------------------------
-
n-1<21979> ssi:boot:base:linear: booting n0 (ccbm-cn02)
n-1<21979> ssi:boot:base:linear: booting n1 (ccbm-cn03)
ERROR: LAM/MPI unexpectedly received the following on stderr:
bash: line 1: tkill: command not found
----------------------------------------------------------------------------
-
LAM failed to execute a LAM binary on the remote node "ccbm-cn03".
Since LAM was already able to determine your remote shell as "tkill",
it is probable that this is not an authentication problem.
LAM tried to use the remote agent command "/usr/bin/ssh"
to invoke the following command:
/usr/bin/ssh -x ccbm-cn03 -n tkill -v
This can indicate several things. You should check the following:
- The LAM binaries are in your $PATH
- You can run the LAM binaries
- The $PATH variable is set properly before your
.cshrc/.profile exits
Try to invoke the command listed above manually at a Unix prompt.
You will need to configure your local setup such that you will *not*
be prompted for a password to invoke this command on the remote node.
No output should be printed from the remote node before the output of
the command is displayed.
When you can get this command to execute successfully by hand, LAM
will probably be able to function properly.
----------------------------------------------------------------------------
-
n-1<21979> ssi:boot:base:linear: Failed to boot n1 (ccbm-cn03)
n-1<21979> ssi:boot:base:linear: aborted!
lamboot did NOT complete successfully
Is this still the the 'static pthreads' proble. I am not using the -static
option anymore.
I'm attaching the latest config file.
Thanks,
Tabish.
============================================================================
=====================
-----Original Message-----
From: lam-bounces_at_[hidden] [mailto:lam-bounces_at_[hidden]] On Behalf Of
Jeff Squyres
Sent: Wednesday, December 22, 2004 12:55 PM
To: General LAM/MPI mailing list
Subject: Re: LAM: Lamboot problems
The real question is why you don't have the rsh boot module, which is what
it should be using.
I think the answer lies in the command you used to configure LAM. I notice
that you included the option:
"--with-rsh=--with-rsh=/usr/bin/ssh -x"
I think you accidentally listed --with-rsh in there twice. Change it
to:
"--with-rsh=/usr/bin/ssh -x"
On Dec 22, 2004, at 12:31 PM, Tabish Almas wrote:
>
> Thanks for the reply.
>
> I went ahead and compiled lam with gnu compilers.
> But still I get the same error during lamboot,
>
> ----------------------------------------------------------------------
> -
> -----
> -
> No SSI boot modules said that they were available to run. This should
> not happen.
> ----------------------------------------------------------------------
> -
> -----
> -
>
>
> I'm attaching the new config.log file.
>
>
> Sorry but I didn't understand the 'pthread' and 'glibc' problems that
> you mentioned.
> Could you please explain it.
>
>
> Thanks,
> Tabish
>
>
>
>
>
>
> -----Original Message-----
> From: lam-bounces_at_[hidden] [mailto:lam-bounces_at_[hidden]] On
> Behalf Of Tim Prince
> Sent: Wednesday, December 22, 2004 11:27 AM
> To: General LAM/MPI mailing list; General LAM/MPI mailing list;
> lam_at_[hidden]
> Subject: Re: LAM: Lamboot problems
>
> At 08:09 AM 12/22/2004, Tim Prince wrote:
>
>> At 07:40 AM 12/22/2004, Tabish Almas wrote:
>>
>>> Hi,
>>>
>>> I am trying to install Lam 7.0.6 on our Linux cluster with Intel
>>> nodes.
>>>
>>> I have compiled lam 7.0.6 with intel compiler version 8.0.
>>
>> Might be worth while to upgrade
>>
>>> I set the following variables before using the configure script:
>>>
>>> export CC=icc
>>> export CXX=icc
>>
>> and specify the C++ compiler icpc from the beginning, rather than
>> depending on lam configure to find it.
>> Building with icc and g77 is a bit odd, but it may be OK.
>>
>>> export CFLAGS="-O3 -static -static-libcxa"
>>> export CXXFLAGS="-O3 -static -static-libcxa"
>>
>> not sure if -static has an adverse effect. For example, with RH9,
>> you must upgrade to a current kernel in order to support static
>> pthreads
> library.
>
> Pardon me, that's a glibc upgrade.
>
>
>>> I've attached the zipped config.log file.
>>> I'll appreciate if someone could help me.
>>
>> It says you don't have pthreads support for the way you have
>> configured.
>>
>>
>> Tim Prince
>> _______________________________________________
>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
>
> Tim Prince
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> <config.log.gz>_______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/
|