LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Rui Ramos (rramos_at_[hidden])
Date: 2006-09-28 12:10:03


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi there,

 I was integrating blcr with LAM/MPI and I'm facing some issues.

 when trying to restart a context file with lamrestart i get:
 
 "cri_syscall(CR_OP_RSTRT_REQ, &req): Device or resource busy"

 This is what i've done:
   - laminfo
             LAM/MPI: 7.1.2
              Prefix: /opt/lam
        Architecture: x86_64-unknown-linux-gnu
       Configured by: root
       Configured on: Thu Sep 28 15:23:39 WEST 2006
      Configure host: XXXXXXXXX
      Memory manager: ptmalloc2
          C bindings: yes
        C++ bindings: yes
    Fortran bindings: yes
          C compiler: gcc
        C++ compiler: g++
    Fortran compiler: g77
     Fortran symbols: double_underscore
         C profiling: yes
       C++ profiling: yes
   Fortran profiling: yes
      C++ exceptions: no
      Thread support: yes
       ROMIO support: yes
        IMPI support: no
       Debug support: no
        Purify clean: no
            SSI boot: globus (API v1.1, Module v0.6)
            SSI boot: rsh (API v1.1, Module v1.1)
            SSI boot: slurm (API v1.1, Module v1.0)
            SSI coll: lam_basic (API v1.1, Module v7.1)
            SSI coll: shmem (API v1.1, Module v1.0)
            SSI coll: smp (API v1.1, Module v1.2)
             SSI rpi: crtcp (API v1.1, Module v1.1)
             SSI rpi: lamd (API v1.0, Module v7.1)
             SSI rpi: sysv (API v1.0, Module v7.1)
             SSI rpi: tcp (API v1.0, Module v7.1)
             SSI rpi: usysv (API v1.0, Module v7.1)
              SSI cr: blcr (API v1.0, Module v1.1)
              SSI cr: self (API v1.0, Module v1.0)

 - I've tested blcr with single processes without lam and it work.
 - then i try a simple test program.

#include <stdio.h>
#include <mpi.h>

main(int argc, char **argv)
{
   int node;

   int i, j;
   float f;
   
   MPI_Init(&argc,&argv);
   MPI_Comm_rank(MPI_COMM_WORLD, &node);
     
   printf("Hello World from Node %d.\n", node);
   for (j=0; j<=100000; j++)
      for(i=0; i<=100000; i++){
          f=i*2.718281828*i+i+i*3.141592654;
          printf("Interaction [i=%d,j=%d]=%f\n",i,j,f);
        }
   MPI_Finalize();
}

 compiled it with:

 mpicc -o hello-lam mpihello.c -L/opt/lam/lib/ -I/opt/lam/include -lmpi

 executed with:

 mpirun -ssi cr blcr C hello-lam

 Created the context file with:
 
 lamcheckpoint -ssi cr blcr -pid 22286 -ssi cr_blcr_base_dir /tmp

 Kill the process:

 kill -9 22286

 Try the restart from context:

 lamrestart -ssi cr blcr -ssi cr_blcr_context_file /tmp/context.mpirun.22286

 which returns me:
 
 cri_syscall(CR_OP_RSTRT_REQ, &req): Device or resource busy

 Any ideas how i could solve this ???

                                       appreciate any help :)

PS: blcr is version blcr-0.4.2

- --
Rui Ramos
==============================================
Universidade do Porto - IRICUP
Praça Gomes Teixeira, 4099-002 Porto, Portugal
email: rramos[at]iric.up.pt
==============================================
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)

iQEVAwUBRRvz271uR0bdnTWSAQKZGQgAsdM+cOMdvT38ysVJdwO/+JN707Fwli07
mNcbpb0UsKIaQbkycbL7B/o/zW5VC45j5Nmkbn0s/v48A+9MPIzQZQ7qE6azU2vG
wL5mMo1bqcHEPusDgNXbLoK7HHVKcGTBBOR7UqXcRyfpUFx/ohJiNp/YutZlIkN/
DXWHd4PW3EXVBgoKn0SUkgIJ8Rk7tE1D2TUTc+7TzqV6lXoxtA6sOoqyFEOZYzjc
UZLDKFG9KF/r8EMIU7/Px0YxJn2kODHFzNq6VmoWmdT27QLfLjqUdvFrciSTNSyt
i5xtfEbHWVaTOFr0QSFVz0mf3wG31Wgg2Wq43kRbbvjAfR4VsqT/uQ==
=BIeF
-----END PGP SIGNATURE-----