LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Mark (mark_at_[hidden])
Date: 2004-09-08 17:21:12


I successfully built LAM 7.1b20 on Cygwin 1.5.11-1; I think the reason for
my previous difficulty is that I used the default setting for memory
management, and this time I used --with-memory-manager=none when I
configured. The main clue was that the seg fault occurred in LAM's
ptmalloc_init:

$ gdb hello
GNU gdb 2003-09-20-cvs (cygwin-special)
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i686-pc-cygwin"...
(gdb) run
Starting program: /mpi/lam-7.1b18/examples/hello/hello.exe

Program received signal SIGSEGV, Segmentation fault.
0x61011e44 in my_findenv(char const*, int*) (
    name=0x46e1d4 "MALLOC_TRIM_THRESHOLD_", offset=0x22efe4)
    at ../../../winsup/cygwin/environ.cc:177
177 for (p = cur_environ (); *p; ++p)
Current language: auto; currently c++
(gdb) where
#0 0x61011e44 in my_findenv(char const*, int*) (
    name=0x46e1d4 "MALLOC_TRIM_THRESHOLD_", offset=0x22efe4)
    at ../../../winsup/cygwin/environ.cc:177
#1 0x61011eb8 in getenv (name=0x46e1d4 "MALLOC_TRIM_THRESHOLD_")
    at ../../../winsup/cygwin/environ.cc:198
#2 0x00405bdd in ptmalloc_init () at arena.c:418
#3 0x004070cf in malloc_hook_ini (sz=16, caller=0x0) at hooks.c:47
#4 0x00408be9 in malloc (bytes=16) at malloc.c:3293
#5 0x61042ee5 in malloc_init() ()
    at ../../../winsup/cygwin/malloc_wrapper.cc:264
#6 0x61004c6d in dll_crt0_1(char*) () at
../../../winsup/cygwin/dcrt0.cc:731
#7 0x610051fb in _dll_crt0 () at ../../../winsup/cygwin/dcrt0.cc:942
(gdb)

-----Original Message-----
From: lam-bounces_at_[hidden] [mailto:lam-bounces_at_[hidden]] On Behalf Of
Mark
Sent: Tuesday, August 31, 2004 9:31 AM
To: 'General LAM/MPI mailing list'
Subject: RE: LAM: lam7.1b16 doesn't work on Cygwin 1.5.10 Windows XP SP1

Hi Anju,

Thank you for your response; I have attached some config logs for you. Are
you using Cywin 1.5.10-3?

I ran lamboot:
$ lamboot -v -ssi boot rsh /usr/local/etc/lam-hostmap.txt

LAM 7.1b18/MPI 2 C++ - Indiana University

n-1<3068> ssi:boot:base:linear: booting n0 (xxxxx)
n-1<3068> ssi:boot:base:linear: finished

I can run mpirun without any arguments, and I can run lamnodes as well:
$ lamnodes
n0 xxxxx.xxxxxxxxxx.xxxx.xxxxxxx.xxx:1:origin,this_node
 

Here's what I get when I run hello with mpirun:
xxxx_at_xxxxx /mpi/lam-7.1b16/examples/hello
$ mpirun C ./hello
----------------------------------------------------------------------------
It seems that [at least] one of the processes that was started with
mpirun did not invoke MPI_INIT before quitting (it is possible that
more than one process did not invoke MPI_INIT -- mpirun was only
notified of the first one, which was on node n0).

mpirun can *only* be used with MPI programs (i.e., programs that
invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program
to run non-MPI programs over the lambooted nodes.
----------------------------------------------------------------------------

xxxx_at_xxxxx /mpi/lam-7.1b16/examples/hello
$

Nothing seems to happen when I run hello without mpirun:
xxxx_at_xxxxx /mpi/lam-7.1b16/examples/hello
$ hello

xxxx_at_xxxxx /mpi/lam-7.1b16/examples/hello
$

However, I built a debug 1.5.10-3 cygwin1.dll from source and used gdb to
discover a segfault:
$ gdb hello
GNU gdb 2003-09-20-cvs (cygwin-special)
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i686-pc-cygwin"...
(gdb) run
Starting program: /mpi/lam-7.1b16/examples/hello/hello.exe

Program received signal SIGSEGV, Segmentation fault.
0x61011e24 in my_findenv(char const*, int*) (
    name=0x45b2d6 "MALLOC_TRIM_THRESHOLD_", offset=0x22eff4)
    at ../../../winsup/cygwin/environ.cc:177
177 for (p = cur_environ (); *p; ++p)
Current language: auto; currently c++
(gdb)

I tried linking in the dmalloc library, but I got multiple definitions of
symbols; the details are below. Will these go away if I use -DWITH_DMALLOC?
Making all in laminfo
make[2]: Entering directory `/mpi/lam-7.1b18/tools/laminfo'
if g++ -DHAVE_CONFIG_H -I. -I. -I../../share/include
-DLAM_PREFIX="\"/usr/local
\"" -DLAM_BINDIR="\"/usr/local/bin\"" -DLAM_LIBDIR="\"/usr/local/lib\""
-DLAM_IN
CDIR="\"/usr/local/include\"" -DLAM_PKGLIBDIR="\"/usr/local/lib/lam\""
-DLAM_SYS
CONFDIR="\"/usr/local/etc\"" -I../../share/include -DLAM_BUILDING=1
-D_REENTRAN
T -g -Wall -Wundef -Wno-long-long -MT laminfo.o -MD -MP -MF
".deps/laminfo.Tpo
" -c -o laminfo.o laminfo.cc; \
then mv -f ".deps/laminfo.Tpo" ".deps/laminfo.Po"; else rm -f
".deps/laminfo.Tpo
"; exit 1; fi
/bin/bash ../../libtool --mode=link g++ -g -Wall -Wundef -Wno-long-long
-ldma
lloc -L/usr/local/lib -o laminfo.exe laminfo.o
../../share/libmpi/libmpi.la .
./../share/liblam/liblam.la
mkdir .libs
g++ -g -Wall -Wundef -Wno-long-long -o laminfo.exe laminfo.o
-L/usr/local/lib .
./../share/libmpi/.libs/libmpi.a ../../share/liblam/.libs/liblam.a -ldmalloc
/usr/local/lib/libdmalloc.a(malloc.o)(.text+0xba0): In function `malloc':
/mpi/dmalloc/dmalloc-5.3.0/malloc.c:1050: multiple definition of `_malloc'
../../share/libmpi/.libs/libmpi.a(malloc.o)(.text+0x24e9):/mpi/lam-7.1b18/sh
are/
memory/ptmalloc2/malloc.c:3286: first defined here
/usr/local/lib/libdmalloc.a(malloc.o)(.text+0xbe0): In function `calloc':
/mpi/dmalloc/dmalloc-5.3.0/malloc.c:1079: multiple definition of `_calloc'
../../share/libmpi/.libs/libmpi.a(malloc.o)(.text+0x2b6f):/mpi/lam-7.1b18/sh
are/
memory/ptmalloc2/malloc.c:3516: first defined here
/usr/local/lib/libdmalloc.a(malloc.o)(.text+0xc20): In function `realloc':
/mpi/dmalloc/dmalloc-5.3.0/malloc.c:1111: multiple definition of `_realloc'
../../share/libmpi/.libs/libmpi.a(malloc.o)(.text+0x26f7):/mpi/lam-7.1b18/sh
are/
memory/ptmalloc2/malloc.c:3367: first defined here
/usr/local/lib/libdmalloc.a(malloc.o)(.text+0xca0): In function `memalign':
/mpi/dmalloc/dmalloc-5.3.0/malloc.c:1176: multiple definition of `_memalign'
../../share/libmpi/.libs/libmpi.a(malloc.o)(.text+0x28ae):/mpi/lam-7.1b18/sh
are/
memory/ptmalloc2/malloc.c:3441: first defined here
/usr/local/lib/libdmalloc.a(malloc.o)(.text+0xce0): In function `valloc':
/mpi/dmalloc/dmalloc-5.3.0/malloc.c:1206: multiple definition of `_valloc'
../../share/libmpi/.libs/libmpi.a(malloc.o)(.text+0x2a44):/mpi/lam-7.1b18/sh
are/
memory/ptmalloc2/malloc.c:3486: first defined here
/usr/local/lib/libdmalloc.a(malloc.o)(.text+0xd90): In function `free':
/mpi/dmalloc/dmalloc-5.3.0/malloc.c:1272: multiple definition of `_free'
../../share/libmpi/.libs/libmpi.a(malloc.o)(.text+0x2650):/mpi/lam-7.1b18/sh
are/
memory/ptmalloc2/malloc.c:3326: first defined here
/usr/local/lib/libdmalloc.a(malloc.o)(.text+0xdc0): In function `cfree':
/mpi/dmalloc/dmalloc-5.3.0/malloc.c:1303: multiple definition of `_cfree'
../../share/libmpi/.libs/libmpi.a(malloc.o)(.text+0x2fa2):/mpi/lam-7.1b18/sh
are/
memory/ptmalloc2/malloc.c:3670: first defined here
collect2: ld returned 1 exit status
make[2]: *** [laminfo.exe] Error 1
make[2]: Leaving directory `/mpi/lam-7.1b18/tools/laminfo'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/mpi/lam-7.1b18/tools'
make: *** [all-recursive] Error 1

xxxx_at_xxxxx /mpi/lam-7.1b18
$

Thanks again,

Mark

-----Original Message-----
From: lam-bounces_at_[hidden] [mailto:lam-bounces_at_[hidden]] On Behalf Of
Prabhanjan Kambadur
Sent: Monday, August 30, 2004 11:29 PM
To: General LAM/MPI mailing list
Subject: Re: LAM: lam7.1b16 doesn't work on Cygwin 1.5.10 Windows XP SP1

Hi,

Sorry for the late reply. I tried replicating the issue, but could not.
Could you do a couple of things for me:

1. Could you send me the config logs for the build

2. Were you able to run any other LAM executables such as lamboot or
mpirun. Could you try running a simple hello world program and let me know
whether you see similar results.

Anju

On Thu, 26 Aug 2004, Mark wrote:

> I sent this to the developers' list, but perhaps this list is more
> appropriate for this issue.
>
>
>
> I built lam-7.1b16 on Cygwin 1.5.10 Windows XP SP 1, and when I type in
> laminfo at the prompt the cursor returns without printing any information,
> and I get the following message when I try to run hello:
>
>
>
> xxxx_at_xxxxx /mpi/lam-7.1b16/examples/hello
>
> $ mpirun C hello
>
>
----------------------------------------------------------------------------
> -
>
> It seems that [at least] one of the processes that was started with
>
> mpirun did not invoke MPI_INIT before quitting (it is possible that
>
> more than one process did not invoke MPI_INIT -- mpirun was only
>
> notified of the first one, which was on node n0).
>
>
>
> mpirun can *only* be used with MPI programs (i.e., programs that
>
> invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program
>
> to run non-MPI programs over the lambooted nodes.
>
>
----------------------------------------------------------------------------
> -
>
>
>
>
>
>
>
> When I debugged laminfo, I got the following:
>
>
>
>
>
>
>
> $ gdb laminfo
>
> GNU gdb 2003-09-20-cvs (cygwin-special)
>
> Copyright 2003 Free Software Foundation, Inc.
>
> GDB is free software, covered by the GNU General Public License, and you
are
>
> welcome to change it and/or distribute copies of it under certain
> conditions.
>
> Type "show copying" to see the conditions.
>
> There is absolutely no warranty for GDB. Type "show warranty" for
details.
>
> This GDB was configured as "i686-pc-cygwin"...
>
> (gdb) break 136
>
> Breakpoint 1 at 0x4010b8: file laminfo.cc, line 136.
>
> (gdb) run
>
> Starting program: /usr/local/bin/laminfo.exe
>
>
>
> Program received signal SIGSEGV, Segmentation fault.
>
> 0x6101fd44 in dlfork () from /usr/bin/cygwin1.dll
>
> (gdb)
>
>
>
>
>
>
>
> Does anyone have any insight into this problem?
>
>
>
> Thanks,
>
>
>
> Mark
>
>
>
>
>
>
_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/