LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: William Bierman (wbierman_at_[hidden])
Date: 2005-03-29 00:59:39


It seems I have discovered some sort of a bug with LAM. Please note
when I say that it is not because I'm assuming since it's not working
it must be a problem with LAM, but I have attempted many different
scenarios, all with the same result. When I do lamboot, everything
loads properly. If I do lamexec N uname -s, I get the output I would
expect. However, if I try to run a simple hello world mpi program, I
get the following error:

$ mpirun C ./h
-----------------------------------------------------------------------------
It seems that [at least] one of the processes that was started with
mpirun did not invoke MPI_INIT before quitting (it is possible that
more than one process did not invoke MPI_INIT -- mpirun was only
notified of the first one, which was on node n-1077941600).

mpirun can *only* be used with MPI programs (i.e., programs that
invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program
to run non-MPI programs over the lambooted nodes.
-----------------------------------------------------------------------------

and it creates a corefile, h.core. I get the following backtrace:

(gdb) bt
#0 0x280e5d5d in pthread_key_create () from /usr/lib/libpthread.so.1
#1 0x0805e7ef in ptmalloc_init ()
#2 0x080604ef in malloc_hook_ini ()
#3 0x080603f5 in malloc ()
#4 0x280eb21a in pthread_mutex_init () from /usr/lib/libpthread.so.1
#5 0x280f4cf0 in pthread_setconcurrency () from /usr/lib/libpthread.so.1
#6 0x280f4761 in pthread_setconcurrency () from /usr/lib/libpthread.so.1
#7 0x280f7e76 in pthread_testcancel () from /usr/lib/libpthread.so.1
#8 0x280f8fee in __error () from /usr/lib/libpthread.so.1
#9 0x280e0792 in ?? () from /usr/lib/libpthread.so.1
#10 0x280aa6c5 in find_symdef () from /libexec/ld-elf.so.1
#11 0x280a951b in _rtld () from /libexec/ld-elf.so.1
#12 0x280a8966 in .rtld_start () from /libexec/ld-elf.so.1

bill_at_c1:~
$ lamboot -V

LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University

        Arch: i386-unknown-freebsd5.3
        Prefix: /usr/local
        Configured by: root
        Configured on: Tue Feb 22 16:42:25 HST 2005
        Configure host: cluster.uhhcsdept.int
        SSI rpi: crtcp lamd sysv tcp usysv

Here is my hello world code, taken from
http://www.eecis.udel.edu/~saunders/courses/372/01f/manual/manual.html:

#include <stdio.h>
#include <mpi.h>

/*NOTE: The MPI_Wtime calls can be placed anywhere between the MPI_Init
and MPI_Finalize calls.*/

main(int argc, char **argv)
{
   int node;
   double mytime; /*declare a variable to hold the time returned*/

   MPI_Init(&argc,&argv);
   mytime = MPI_Wtime(); /*get the time just before work to be timed*/
   MPI_Comm_rank(MPI_COMM_WORLD, &node);

   printf("Hello World from Node %d\n",node);

   mytime = MPI_Wtime() - mytime; /*get the time just after work is done
                                    and take the difference */
   printf("Timing from node %d is %lf seconds.\n",node,mytime);
   MPI_Finalize();

}

I have tried this on some old code I used to run on LAM 7.0.1, iirc,
which worked perfectly, and got the same result.

I am running FreeBSD 5.3-RELEASE and have 21 nodes on my cluster.

Any advice? I am more than willing to provide whatever additional
information may be required.

Thanks in advance!

Bill