Hi Tim,
>> 4) It _could_ be possible that I have mixed
>> two versions of LAM (installed is a 6.5.4(!),
>> I use a 7.1.beta). I tried to avoid problems
>> using a new $PATH, $LD_LIBRARY_PATH, and
>> -I/xxx in the Makefile. But I am not sure ...
>
>
> PATH must be set up on each node so that the new installation is used by
> lamboot and while running. I can't understand why updates of a
> supposedly supported version of linux continue to use such old version
> of lam (and binutils, gcc). If you configure your lam with
> --prefix=/usr it will over-write the old one entirely, and there is no
> question of mix-up. Yes, it will fail if the nodes don't all see the
> same lam installation on PATH.
If I had the root password I would do a clean install
of the new LAM version over the old version ...
What I did is:
1) ./configure --prefix=<my_lam_dir>
2) in the .bashrc:
export $PATH="<my_lam_dir>/bin:$PATH"
export $LD_LIBRARY_PATH="<my_lam_dir>/lib:$LD_LIBRARY_PATH"
3) in the Makefile:
mpiCC -I<my_lam_dir>/include
This seems to work because I can run LAM programs.
My problem is the Infiniband RPI. Usind 'mpirun ... -ssi rpi ib ...'
I get the errors:
----------------------
An erroneous completion was generated while polling for the Infiniband
completion queue
The exact error string returned by Infiniband API is as follows:
"Operation Completed Successfully"
----------------------
So it seems infiniband API return a success but LAM thinks this
is a error ...
Any ideas?
Bye and thanks,
Charlie
|