Hi folks,
I feel these lam/mpi version mismatch problems are *very* frequent on
the list.
After all it is not so obvious for the end user to be sure he has set
his PATH correctly (as it may show up differently in a normal login and
in a rsh call). Also when using poor man's clusters, a version mismatch
in the same PATH is always possible.
I guess whether it should be possible to automate somehow the version
checking durint the lamboot process ? That might result in converting
most of these errors into a clear message indicating the version found
on the home node and on the mismatching ones.
This would imply that the preliminary boot phase would use an additional
version-independant ack which would fail with an explicit message for
all newer versions and would state a less explicit version mismatch with
older versions.
Ideally the library itself should recognize that all executables have
been linked to the same version as well, because mismatch may result in
later hang up or even in wrong results.
At least that automatic consistency check might be a useful requirement
for the Open-MPI developments.
Best.
Pierre.
Brian Barrett wrote:
>On Jun 21, 2005, at 10:02 PM, Madhurjya P. Bora wrote:
>
>
>
>>I have the lam-7.0.3 successfully running on my Fedora Core 2
>>standalone
>>machine (localhost), which I use for test purpose. This lam-7.0.3 came
>>as an RPM along with the system.
>>
>>When I've built the lam-7.1.1 from the .tar.bz2 package for the
>>Lahey-Fujitsu FORTRAN 95 compiler, the built went on successfully. But
>>during lamboot from the newly built pacakge complains of TCP random
>>ports. However recon is successful!
>>
>>My configure option was just with a prefix dir i.e. ./configure
>>-prefix=/usr/local/lam/lf95.
>>The old lam still boots! I'm using SSH-2. Kindly help!
>>
>>
>
>Tim is right - you should read the error message before posting ;).
>Unfortunately, you didn't include enough information for me to be able
>to help you. There are a couple of different things that could be the
>problem. First, the default RPM install is going to be in /usr/bin.
>Make sure that /usr/local/lam/lf95/bin appears in your path before
>/usr/bin. That means you should be able to do:
>
> ssh localhost which lamboot
>
>And see "/usr/local/lam/lf95/bin/lamboot". If you see
>"/usr/bin/lamboot", you do not have your path setup correctly. Please
>see the LAM faq for more information about the requirements for setting
>up your path.
>
>It's also possible that there is a problem with the installation of
>LAM. If you are still having problems once you are sure you have your
>path setup correctly, please send the output of lamboot, with the "-d
>-v" flags specified (in addition to your normal arguments). This will
>give a bunch of diagnostic information that can be useful in figuring
>out what is going on. Post to this list with that information and we
>should be able to figure out what is going on.
>
>Hope this helps,
>
>Brian
>
>
>
--
Soutenez le mouvement SAUVONS LA RECHERCHE :
http://recherche-en-danger.apinc.org/
_/_/_/_/ _/ _/ Dr. Pierre VALIRON
_/ _/ _/ _/ Laboratoire d'Astrophysique
_/ _/ _/ _/ Observatoire de Grenoble / UJF
_/_/_/_/ _/ _/ BP 53 F-38041 Grenoble Cedex 9 (France)
_/ _/ _/ http://www-laog.obs.ujf-grenoble.fr
_/ _/ _/ mail: Pierre.Valiron_at_[hidden]
_/ _/ _/ Phone: +33 4 7651 4787 Fax: +33 4 7644 8821
_/ _/_/
|