Hello,
After making a disk array on my head node, I'm having some trouble
getting everything to work correctly again. I believe I re-installed
lam-mpi and xlf and some other libraries in the same order I did
originally, but code which used to run now breaks at run-time.
Has anyone seen this error before?:
*** malloc[20312]: error for object 0x505a90: Incorrect checksum for
freed object - object was probably modified after being freed; break at
szone_error
I'm running:
8 dual-processor G5 Xserves
Mac OS X Server 10.3.8 with the latest security updates
gcc version 3.3 20030304 (build 1671)
xlf 8.1
lam-7.2b1r10036 lam with -O3 -qtune-auto -qarch=auto
The code compiles and runs in serial (without MPI).
Any pointers would be appreciated.
-Ed
..............................................
Edward D. Zaron, PhD
Research Associate
College of Oceanic and Atmospheric Sciences
Oregon State University
Corvallis, OR 97331-5503
Phone: (541) 737-3504
Fax: (541) 737-2064
ezaron_at_[hidden]
...............................................
On Feb 16, 2005, at 9:01 AM, lam-request_at_[hidden] wrote:
> Send lam mailing list submissions to
> lam_at_[hidden]
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://www.lam-mpi.org/mailman/listinfo.cgi/lam
> or, via email, send a message with subject or body 'help' to
> lam-request_at_[hidden]
>
> You can reach the person managing the list at
> lam-owner_at_[hidden]
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of lam digest..."
> Today's Topics:
>
> 1. Re: can't configure lam-7.1.1 with Intel 8.1 Fortran
> (Jeff Squyres)
> 2. 7.1.2b13 (Jeff Squyres)
> 3. Re: lam Digest, Vol 321, Issue 3 (Rodney Mach)
> 4. Re: lam Digest, Vol 321, Issue 2 (Rodney Mach)
> 5. Re: Re: SGE - LAM Integration / hboot.c setsid() (Reuti)
> 6. Re: can't configure lam-7.1.1 with Intel 8.1 Fortran
> (Damien Hocking)
> 7. Re: Re: SGE - LAM Integration / hboot.c setsid() (Bogdan
> Costescu)
> 8. Re: Re: lam Digest, Vol 321, Issue 2 (Hugh Merz)
> 9. Re: Re: SGE - LAM Integration / hboot.c setsid() (Jeff Squyres)
> 10. Octave and MPITB (Christian F. V?lez Witrofsky)
> 11. Re: Octave and MPITB (Nelson Brito)
>
> From: Jeff Squyres <jsquyres_at_[hidden]>
> Date: February 16, 2005 4:18:40 AM PST
> To: General LAM/MPI mailing list <lam_at_[hidden]>
> Subject: Re: LAM: can't configure lam-7.1.1 with Intel 8.1 Fortran
> Reply-To: General LAM/MPI mailing list <lam_at_[hidden]>
>
>
> On Feb 15, 2005, at 4:20 PM, Damien Hocking wrote:
>
>> ./conftest: error while loading shared libraries: libcxaguard.so.5:
>> cannot open shared object file: No such file or directory
>
> This is the problem (from config.log). Configure tried to compile
> something with ifort and then tried to run it, but got this shared
> library error.
>
> This typically means that there is something missing from your
> LD_LIBRARY_PATH.
>
> You can double check this by trying to compile anything with ifort
> (even outside of MPI) and then running it. Once you can compile / run
> with no errors like this (e.g., by fixing up your LD_LIBRARY_PATH),
> you should be able to run LAM's configure successfully.
>
> --
> {+} Jeff Squyres
> {+} jsquyres_at_[hidden]
> {+} http://www.lam-mpi.org/
>
>
>
>
>
> From: Jeff Squyres <jsquyres_at_[hidden]>
> Date: February 16, 2005 6:22:03 AM PST
> To: General LAM/MPI mailing list <lam_at_[hidden]>
> Subject: LAM: 7.1.2b13
> Reply-To: General LAM/MPI mailing list <lam_at_[hidden]>
>
>
> I've just released LAM 7.1.2b13 (http://www.lam-mpi.org/beta/).
>
> It contains all the latest fixes and changes that have been talked
> about on this list and the devel list.
>
> We're really zeroing in on the final 7.1.2 release -- this will
> hopefully be the final beta.
>
> If you have a little time, we'd appreciate if you could give this beta
> a whirl.
>
> Thanks!
>
> --
> {+} Jeff Squyres
> {+} jsquyres_at_[hidden]
> {+} http://www.lam-mpi.org/
>
>
>
>
>
> From: Rodney Mach <rwm_at_[hidden]>
> Date: February 16, 2005 6:24:11 AM PST
> To: lam_at_[hidden]
> Subject: LAM: Re: lam Digest, Vol 321, Issue 3
> Reply-To: General LAM/MPI mailing list <lam_at_[hidden]>
>
>
>
> Hello,
>
> make sure your LD_LIBRARY_PATH is set correctly to use the EM64T
> library
> path when using EM64T 64-bit compilers. If you have it set for the non
> EM64T version of 8.1 compiler (e.g. the 32-bit version) you will get
> the
> same error you describe.
>
> If using EM64T compiler this should be in your LD_LIBRARY_PATH:
> /opt/intel_fce_80/lib
> NOT this:
> /opt/intel_fc_80/lib
>
> (Note the extra "e" in there, it is hard to notice on a quick glance.)
>
> When using 32-bit 8.1 compiler the opposite is true.
>
>
> -Rod
>
> Rod Mach
> HPC Technical Director
> Absoft Corporation
>
>> 3. Re: can't configure lam-7.1.1 with Intel 8.1 Fortran
>> (Damien Hocking)
>>>
>>> this is odd. I just switched over to using the 8.1 Fortran compiler,
>>> EM64T version, and the configure step fails trying to find the size
>>> of a
>>> Fortran integer. No fancy flags set, just plain fcc=ifort. I looked
>>> around the Intel forums but couldn't find anything. Anyone got this
>>> to
>>> work? All fine on 7.0, 7.1, and g95.
>>>
>>> Damien
>>>
>
>
>
>
>
>
>
> From: Rodney Mach <rwm_at_[hidden]>
> Date: February 16, 2005 6:24:20 AM PST
> To: lam_at_[hidden]
> Subject: LAM: Re: lam Digest, Vol 321, Issue 2
> Reply-To: General LAM/MPI mailing list <lam_at_[hidden]>
>
>
> The TAU tools are good, They are actively working on these tools and
> support a wide variety of configurations:
>
> http://www.cs.uoregon.edu/research/paracomp/tau/tautools/
>
>
> -Rod Mach
>
> HPC Technical Director
> Absoft Corporation
> http://www.absoft.com
>
>> On Feb 14, 2005, at 12:10 PM, Shi Jin wrote:
>>> Hi,
>>>
>>> We have a MPI code working and would like to know how
>>> much time is spent in message passing compared to the
>>> total time. The total time is easy to measure by
>>> adding MPI_WTIME at the end and beginning of the code.
>>> But instead of adding timing for each MPI calls and
>>> adding them up, is there any easier way to get the
>>> total time spent in all MPI calls?
>>> Thanks a lot.
>>> Shi
>
>
>
>
>
>
> From: Reuti <reuti_at_[hidden]>
> Date: February 16, 2005 6:25:54 AM PST
> To: General LAM/MPI mailing list <lam_at_[hidden]>
> Subject: Re: LAM: Re: SGE - LAM Integration / hboot.c setsid()
> Reply-To: General LAM/MPI mailing list <lam_at_[hidden]>
>
>
> Jeff,
>
> when the TM in SGE comes true, I think the already implemented PBS
> startup in LAM could be used. But it will still take some time, until
> this comes true.
>
> So I will look into the SSI stuff also just for personal interest,
> whether there is any easy possibility to get it working with current
> version of SGE.
>
> Cheers - Reuti
>
> Jeff Squyres wrote:
>> If you really want to work on it, the proper way would be to write a
>> boot SSI module that understands SGE. hboot would likely not be used
>> (e.g., it's not used in a TM environment) -- it's a holdover from the
>> bad-old rsh/ssh days -- I can explain more if you care. The boot SSI
>> docs are on our documentation web page.
>> We had extensive discussions with the SGE guys about this a while
>> ago, and their feeling was that the script-based approach was
>> simpler, which is why we never bothered to write a boot SSI module.
>> But I'm a purist; if I had the cycles, I'd like to see an SGE boot
>> SSI module (it would be an easier experience for the sysadmin, too).
>> More specifically, I'd like to see this kind of support in Open MPI
>> -- we're not doing too much new work in LAM these days.
>> I'd also like to see something better than a linear startup mechanism
>> in SGE -- just my $0.02. ;-)
>
>
>
>
>
> From: Damien Hocking <damien_at_[hidden]>
> Date: February 15, 2005 11:35:30 PM PST
> To: General LAM/MPI mailing list <lam_at_[hidden]>
> Subject: Re: LAM: can't configure lam-7.1.1 with Intel 8.1 Fortran
> Reply-To: General LAM/MPI mailing list <lam_at_[hidden]>
>
>
> Thanks. <Insert sheepish look here>. I should have found that myself,
> obviously I can't read....
>
> Damien
>
> Jeff Squyres wrote:
>
>> On Feb 15, 2005, at 4:20 PM, Damien Hocking wrote:
>>
>>> ./conftest: error while loading shared libraries: libcxaguard.so.5:
>>> cannot open shared object file: No such file or directory
>>
>>
>> This is the problem (from config.log). Configure tried to compile
>> something with ifort and then tried to run it, but got this shared
>> library error.
>>
>> This typically means that there is something missing from your
>> LD_LIBRARY_PATH.
>>
>> You can double check this by trying to compile anything with ifort
>> (even outside of MPI) and then running it. Once you can compile / run
>> with no errors like this (e.g., by fixing up your LD_LIBRARY_PATH),
>> you should be able to run LAM's configure successfully.
>>
>
>
>
>
> From: Bogdan Costescu <Bogdan.Costescu_at_[hidden]>
> Date: February 16, 2005 6:56:52 AM PST
> To: General LAM/MPI mailing list <lam_at_[hidden]>
> Subject: Re: LAM: Re: SGE - LAM Integration / hboot.c setsid()
> Reply-To: General LAM/MPI mailing list <lam_at_[hidden]>
>
>
> On Wed, 16 Feb 2005, Reuti wrote:
>
>> So I will look into the SSI stuff also just for personal interest,
>> whether there is any easy possibility to get it working with current
>> version of SGE.
>
> Given that I've already tried this, I would like to just say that the
> LAM part is easy. It's already working for:
>
> - rsh/ssh, starting lamd via hboot
> - TM, starting lamd directly
>
> so you can have these 2 boot modules as examples for the two extremes
> that could be used for a SGE boot module.
>
> The SGE part is more complicated, especially as there is no
> documentation for all the list operations - there's where I stopped.
> If you have more time/determination/luck, then by all means go ahead
> :-)
>
> --
> Bogdan Costescu
>
> IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
> Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
> Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
> E-mail: Bogdan.Costescu_at_[hidden]
>
>
>
>
>
> From: Hugh Merz <merz_at_[hidden]>
> Date: February 16, 2005 6:59:18 AM PST
> To: General LAM/MPI mailing list <lam_at_[hidden]>
> Subject: Re: LAM: Re: lam Digest, Vol 321, Issue 2
> Reply-To: General LAM/MPI mailing list <lam_at_[hidden]>
>
>
> There is also the Intel Trace Collector/Analyzer (formerly known as
> Vampir).
> Although it is pitched to only run on Intel processor-based clusters
> it can currently be obtained free as an evaluation version:
>
> http://www.intel.com/software/products/cluster/
>
> I'm definately going to take a look at TAU - it looks very promising.
>
> Hugh
>
>
> On Wed, 16 Feb 2005, Rodney Mach wrote:
>
>> The TAU tools are good, They are actively working on these tools and
>> support a wide variety of configurations:
>>
>> http://www.cs.uoregon.edu/research/paracomp/tau/tautools/
>>
>>
>> -Rod Mach
>>
>> HPC Technical Director
>> Absoft Corporation
>> http://www.absoft.com
>>
>>> On Feb 14, 2005, at 12:10 PM, Shi Jin wrote:
>>>> Hi,
>>>> We have a MPI code working and would like to know how
>>>> much time is spent in message passing compared to the
>>>> total time. The total time is easy to measure by
>>>> adding MPI_WTIME at the end and beginning of the code.
>>>> But instead of adding timing for each MPI calls and
>>>> adding them up, is there any easier way to get the
>>>> total time spent in all MPI calls?
>>>> Thanks a lot.
>>>> Shi
>>
>>
>> _______________________________________________
>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
>
>
>
>
> From: Jeff Squyres <jsquyres_at_[hidden]>
> Date: February 16, 2005 7:13:19 AM PST
> To: General LAM/MPI mailing list <lam_at_[hidden]>
> Subject: Re: LAM: Re: SGE - LAM Integration / hboot.c setsid()
> Reply-To: General LAM/MPI mailing list <lam_at_[hidden]>
>
>
> I'd be a little cautious here...
>
> It's great news that SGE is considering the TM spec (I was previously
> unaware of that), but keep in mind that that spec is many years old,
> was written by an informal group, and I'm not aware of any other
> system that uses it. The PBS people were recently working on
> extending it -- if the goal in implementing TM in SGE is tool
> portability (e.g., for LAM, Open MPI, etc.), it would be worthwhile
> for the SGE folks to talk to the PBS people and see what is being
> done.
>
> Also, note that PBS implements the TM spec with some quirks -- quirks
> that we had to account for when we wrote the TM boot module. So
> unless SGE implements the same quirks, we could be in for some
> portability issues (e.g., if PBS TM is only sorta the same as SGE TM
> -- "different enough" to force some #if code to distinguish between
> the two).
>
>
> On Feb 16, 2005, at 9:25 AM, Reuti wrote:
>
>> Jeff,
>>
>> when the TM in SGE comes true, I think the already implemented PBS
>> startup in LAM could be used. But it will still take some time, until
>> this comes true.
>>
>> So I will look into the SSI stuff also just for personal interest,
>> whether there is any easy possibility to get it working with current
>> version of SGE.
>>
>> Cheers - Reuti
>>
>> Jeff Squyres wrote:
>>> If you really want to work on it, the proper way would be to write a
>>> boot SSI module that understands SGE. hboot would likely not be
>>> used (e.g., it's not used in a TM environment) -- it's a holdover
>>> from the bad-old rsh/ssh days -- I can explain more if you care.
>>> The boot SSI docs are on our documentation web page.
>>> We had extensive discussions with the SGE guys about this a while
>>> ago, and their feeling was that the script-based approach was
>>> simpler, which is why we never bothered to write a boot SSI module.
>>> But I'm a purist; if I had the cycles, I'd like to see an SGE boot
>>> SSI module (it would be an easier experience for the sysadmin, too).
>>> More specifically, I'd like to see this kind of support in Open MPI
>>> -- we're not doing too much new work in LAM these days.
>>> I'd also like to see something better than a linear startup
>>> mechanism in SGE -- just my $0.02. ;-)
>>
>> _______________________________________________
>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
>
> --
> {+} Jeff Squyres
> {+} jsquyres_at_[hidden]
> {+} http://www.lam-mpi.org/
>
>
>
>
>
> From: Christian F. Vélez Witrofsky <cfvelez_at_[hidden]>
> Date: February 16, 2005 7:33:06 AM PST
> To: lam_at_[hidden]
> Subject: LAM: Octave and MPITB
> Reply-To: Christian F. Vélez Witrofsky <cfvelez_at_[hidden]>, General
> LAM/MPI mailing list <lam_at_[hidden]>
>
>
> Dear experts & users,
>
> I have spent some days looking around on the internet on the proper
> use of
> Octave with the MPITB and I have been unable to answer one very simple
> question:
>
> Do I need to install Octave on every node in a cluster to use Octave
> in a parallel program?
>
> I anyone will take the time to answer to this thank you very much.
>
> Christian F. Vélez Witrofsky
> University of Puerto Rico, Rio Piedras
> Natural Science Faculty, Dept. of Computer Science
>
>
>
>
>
> From: Nelson Brito <ntbrito_at_[hidden]>
> Date: February 16, 2005 8:07:13 AM PST
> To: "Christian F. Vélez Witrofsky" <cfvelez_at_[hidden]>, General
> LAM/MPI mailing list <lam_at_[hidden]>
> Cc: Subject: Re: LAM: Octave and MPITB
> Reply-To: General LAM/MPI mailing list <lam_at_[hidden]>
>
>
>
>> Do I need to install Octave on every node in a cluster to use Octave
>> in a parallel program?
>>
>
> you have several ways (3 that i now :-)) to run a parallel program
> with lam-mpi:
> or you install it in every node of your cluster, always with the same
> path;
> or you install it in a shared filesystem;
> or you spawn the program to all the processors (see man mpirun).
>
> So one possible answer to your question is "yes but i don't have to".
>
> Kind regards,
> nelson
>
>
>
|