I have met this problem before and I solve it by adding a -lcr when I
am compiling with mpicc. If you can use lam mpi 7.0 to compile it, it
also works.
mpicc -lcr ....
Cong
On Wed, 20 Oct 2004 22:13:51 +0800, Neville Lee
<neville.lee_at_[hidden]> wrote:
> Thanks for the reply.
>
> I tried version 7.1.2b5. When doing cr_checkpoint, it says:
> rploadgov failed.: No such file or directory
> Process mpirun is also terminated after cr_checkpoint but mpi program
> continued running.
> I configured LAM with
> --with-blcr=/usr/local --with-rpi=crtcp
>
> I also tried 7.0.6 and 7.2b1r9913 with the same configure parameters.
> 7.0.6 does not have any problem but 7.2b1r9913 has similar problems with
> 7.1.1.
>
> Any explanation for this?
>
>
>
> ---------- Forwarded message ----------
> From: Jeff Squyres <jsquyres_at_[hidden]>
> To: General LAM/MPI mailing list <lam_at_[hidden]>
> Date: Tue, 19 Oct 2004 11:05:17 -0400
> Subject: Re: LAM: cr_pthread.c:82 cri_pthread_init: When linking libpthread, it must be linked AFTER libcr
> <div class="moz-text-flowed" style="font-family: -moz-fixed">Sorry for the delay on this -- I looked into this and found that there
> are actually two issues here:
>
> - compiling MPI apps with checkpoint support using BLCR
> - using checkpoints at run-time
>
> The compiling issue turns out to be by design of BLCR -- the "cr"
> library must be linked in before libpthread (which is what you were
> seeing). In the DSO module case, the cr library is linked to the
> module (and not the user's app), it gets loaded in the process *after*
> libpthread, and there's really no way to get the ordering right.
> Hence, these components really need to be statically liked into libmpi
> (I've added release notes about this for 7.1.2).
>
> There are two main ways to do this:
>
> 1. Configure all LAM modules to be statically linked into libmpi. This
> is the default mode, so if you don't specify --enable-shared
> --disable-static --with-modules, it should build this way.
>
> 2. Configure just the cr modules statically linked into libmpi. For
> example:
>
> ./configure --disable-static --enable-shared
> --with-modules=boot,coll,rpi
>
> (you can be a little more fine-grained than that if you want -- the
> above will also compile the self modules statically in libmpi, for
> example)
>
> The second issue is that we apparently accidentally disabled blcr
> altogether with a hackaround for a corner case that shouldn't matter
> (ducong mailed me about this off-list). I have fixed this in SVN and
> have released a new beta tarball with the fixes -- 7.1.2b5. Could you
> give it a whirl?
>
> http://www.lam-mpi.org/beta/
>
> Let me know how this goes.
>
> On Oct 17, 2004, at 3:27 PM, Neville Lee wrote:
>
> > I'm having the exact sam problem.
> >
> > mpicc -showme:
> > gcc -I/usr/local/include -pthread -ldl -lpthread -L/lib
> > -L/usr/local/lib
> > -llammpio -llamf77mpi -lmpi -llam -lutil -lcr -ldl
> >
> > ldd a.out
> > libm.so.6 => /lib/libm.so.6 (0x4002c000)
> > libdl.so.2 => /lib/libdl.so.2 (0x4004e000)
> > libpthread.so.0 => /lib/libpthread.so.0 (0x40051000)
> > libutil.so.1 => /lib/libutil.so.1 (0x400a2000)
> > libcr.so.0 => /usr/local/lib/libcr.so.0 (0x400a6000)
> > libc.so.6 => /lib/libc.so.6 (0x400ad000)
> > /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
> >
> > With mpicc -v source.c 2 > out, I can see that -lcr appears after
> > -lpthread in the argument list of collect2.
> > So I remove the message lines in file 'out', leaving only commands, and
> > run the file as a script. This produces an executable that run without
> > complaints.
> >
> > And ldd output of the new executable:
> > libm.so.6 => /lib/libm.so.6 (0x4002c000)
> > libdl.so.2 => /lib/libdl.so.2 (0x4004e000)
> > libcr.so.0 => /usr/local/lib/libcr.so.0 (0x40051000)
> > libpthread.so.0 => /lib/libpthread.so.0 (0x40058000)
> > libutil.so.1 => /lib/libutil.so.1 (0x400aa000)
> > libc.so.6 => /lib/libc.so.6 (0x400ad000)
> > /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
> > Apparently libcr appears before libpthread now.
> >
> > Is this a bug of mpicc?
> >
> > However, after that I can mpirun the program, and do cr_chechpoint, but
> > when I call cr_restart, it says:
> > mpirun (rpwait): Bad file descriptor
> > Any ideas?
> >
> > BTW I'm using LAM-MPI 7.1.1 and blcr 0.2.3.
> >
> >> Can you send the output of "mpicc -showme" and "ldd a.out"?
> >>
> >> What version of LAM are you using?
> >>
> >>
> >> On Oct 12, 2004, at 1:51 PM, <ducong_at_xxxxxxxxx> wrote:
> >>
> >>
> >>
> >> Hi,
> >> When I am trying to run a MPI program, I got the following error:
> >> $ mpirun -ssi rpi crtcp -np 1 a.out
> >> cr_pthread.c:82 cri_pthread_init: When linking libpthread, it must be
> >> linked AFTER libcr
> >>
> >> My configuration is as follows:
> >> $ ./configure --with-cr-blcr=/usr/local/blcr --with-rpi=crtcp
> >> --prefix=/home/ducong/lam --with-rsh=ssh -x
> >>
> >> How to solve this problem?
> >> Thanks
> >> _______________________________________________
> >> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> >>
> >>
> >> -- {+} Jeff Squyres {+} jsquyres_at_xxxxxxxxxxx {+}
> >> http://www.lam-mpi.org/
> >>
> >>
> >>
> >
> >
> > _______________________________________________
> > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> >
>
> --
> {+} Jeff Squyres
> {+} jsquyres_at_[hidden]
> {+} http://www.lam-mpi.org/
>
> </div>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
>
|