LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: rama krishna (joys_623_at_[hidden])
Date: 2007-03-13 05:03:59


Hai everybody
   
  Iam using AIX Loadleveler3.1 for checkpointing my simple serial application.The problem is while generating ckeckpoint file.It generates ckpt file name with extension .err(ckptname.err).when restarted_from_ckpt is set to yes in job command file and run the job ,the node simply remove the job from the queue and i could not get output file.
   
                                I am posting my job command file and application here.Please say if anybody knows what is the problem for not generating ckpt file in correct format,how to debug the problem.Tnx in advance
   
   
  My job command file
   
  # For First.c
# @ job_type = serial
# @ executable = first
# @ output = stp.out
# @ error = stp.err
# @ class = general
# @ checkpoint = yes
# @ restart_from_ckpt = yes
# @ ckpt_dir = /home/rtsg/crypt/ramakrishna/trial/ex/
# @ ckpt_file = stp.ckpt
# @ restart_on_same_nodes = yes
# @ requirements = Machine == "tf04"
# @ wall_clock_limit = 5:00:00,4:30:00
# @ queue

   
  My application
   
  #include<stdio.h>
#include "llapi.h"
int main()
{
 int i;
 LL_ckpt_info ckpt_info;
 cr_error_t cp_error1;
 
 ckpt_info.version = LL_API_VERSION;
 ckpt_info.step_id = NULL;
 ckpt_info.ckptType=NULL;
 ckpt_info.waitType=NULL;
 ckpt_info.abort_sig=NULL;
 ckpt_info.cp_error_data=&cp_error1;
 ckpt_info.ckpt_rc=0;
 ckpt_info.soft_limit=0;
 ckpt_info.hard_limit=0;
 for(i=1;i<4000;i++)
 {
  printf("%d\n",i);
  if(i==2000)
   ll_init_ckpt(&ckpt_info );
 }
 return 0;
}

  
---------------------------------
Looking for earth-friendly autos?
 Browse Top Cars by "Green Rating" at Yahoo! Autos' Green Center.