LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Paul H. Hargrove (PHHargrove_at_[hidden])
Date: 2009-01-27 02:51:14


I am willing to work on any BLCR problem, if there is one. However, the
"SSI types" message is from LAM's checkpoint code.
Somebody better acquainted with the LAM/MPI code may be able to help you
trace down where the error occurs.
-Paul

Gleb "Crazy Sage" Igumnov wrote:
> When I run my MPI program on single CPU - everything is ok, but when
> I'm trying to make it real parallel using
> mpirun -ssi rpi crtcp -ssi cr blcr -np 2 ./hello
> and checpointing with
> lamcheckpoint -ssi cr blcr -pid mpirun_pid or
> cr_checkpoint -p mpirun_pid --term
> I get following error message:
> -chkpt_watchdog: 'mpirun' (tgid/pid xxx/xxx) exited with signal 11
> during checkpoint.
> -----------------------------------------------------------------------
> Encountered a failure in the SSI types while continuing from
> checkpoint. Aborting in despair :-(
> -----------------------------------------------------------------------
> Segmentation Error
>
> No checkpoint file is created, process is terminated.
> I don't understand at the moment, is my problem in my programm,
> LAM/MPI or BLCR settings, or in virtual machine platform.
>
> LAM-MPI version is 7.1.4
> LAM-MPI is configured with --with-rpi=crtcp --with-cr-blcr
> BLCR version is 0.8.0
> BLCR is configured with --enable-static
> OS - CentiOS 5.
> Platform - Sun xVM Virtual Box
>
> Programm text:
>
> #include <stdio.h>
> #include <mpi.h>
> #include <math.h>
>
> int main (int argc, char *argv[]){
> int rank, size, i;
> long j;
> double x;
> x=5;
> MPI_Init(&argc,&argv);
> MPI_Comm_rank(MPI_COMM_WORLD,&rank);
> MPI_Comm_size(MPI_COMM_WORLD,&size);
> for (i=0;i<100;i++){
> printf("Hello, world! I am %d of %d, iteration %d\n",rank,size,i);
> for(j=0;j<100000000;j++){
> x=sin(x)
> }
> }
> }
> Printf("I am %d of %d. Modus - %f\n",rank,size,x);
> MPI_Finalize();
> return 0;
> }
>
>

-- 
Paul H. Hargrove                          PHHargrove_at_[hidden]
Future Technologies Group                 Tel: +1-510-495-2352
HPC Research Department                   Fax: +1-510-486-6900
Lawrence Berkeley National Laboratory