LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Paul Haney (haney411_at_[hidden])
Date: 2005-06-01 19:32:26


Hi,
I have Matlab code that uses MEX files to call MPI routines. I can compile
this code to a standalone executable and run parallel jobs, however I get
system crashes maybe 30-40% of the time if I use 1 processor (2 jobs per
processor), and crashed 80-90% of the time if I use > 1 processor. Some
details:
I'm using LAM 7.0.4, gcc 3.2.3, Matlab 7.
It seems as though the crash occurs right away in MPI_Init. I can see which
process executes the first command by calling clock(), and it can run
successfully if either node0 or node1 starts first. Anyone have any advice
on how to proceed??
Here's the LAM error message upon crash:

-----------------------------------------------------------------------------
It seems that [at least] one of the processes that was started with
mpirun did not invoke MPI_INIT before quitting (it is possible that
more than one process did not invoke MPI_INIT -- mpirun was only
notified of the first one, which was on node n0).

mpirun can *only* be used with MPI programs (i.e., programs that
invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program
to run non-MPI programs over the lambooted nodes.
-----------------------------------------------------------------------------

Here's some info the Matlab gives me:

------------------------------------------------------------------------
       Segmentation violation detected at Wed Jun 1 18:45:50 2005
------------------------------------------------------------------------

Configuration:
  MATLAB Version: 7.0.4.352 (R14) Service Pack 2
  MATLAB License: unknown
  Operating System: Linux 2.4.20-30.9.papismp #1 SMP Mon May 3 13:57:07 CDT
2004 i686
  Window System: No active display
  Current Visual: None
  Processor ID: x86 Family 15 Model 2 Stepping 9, GenuineIntel
  Virtual Machine: Java 1.5.0 with Sun Microsystems Inc. Java HotSpot(TM)
Client VM
    (mixed mode)
  Default Charset: ibm-923

Register State:
  eax = 08200000 ebx = 40138b18
  ecx = 00000000 edx = 08218468
  esi = 084e7d48 edi = 00040b74
  ebp = bfff89f4 esp = bfff89dc
  eip = 4012c087 flg = 00010216

Stack Trace:
  [0] libpthread.so.0:__pthread_mutex_lock~(265076, 0x085f7050
"lsf-543646-0", 0x08218468 "/tmp", 0x085f7068 "LAM_MPI_SESSION_SUFFI\
X=lsf-54364..") + 23 bytes

Error in ==> MPI_Init at 3

Error in ==> HelloWorld at 4

---------------------------------------------

Even if the code doesn't crash and successfully says 'Hello world' from all
of the nodes, I get the following error:
--------------------------------------------------------
One of the processes started by mpirun has exited with a nonzero exit
code. This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.

PID 8374 failed on node n0 (129.114.62.145) with exit status 22.