LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Patrick.Schend_at_[hidden]
Date: 2007-10-23 09:44:50


Hello

I'm from germany and my english could be pretty bad.
I'm quite new to unix & mpi, so please be gentle.

I try to use LAM/MPI with BLCR Restart Context under Linux Suse 10.2
in a two node LAN Cluster where both machines have the same suse version.
LAM and BLCR are the same version on both machines.
LAM 7.1.4
BLCR 0.5.6
i have written a little programm which is pretty much like the ring example from
your examples. It sends messages between the nodes depending on the rank they
have.
its written for 2 processes.

Booting the Lam Universe is no problem.

after that i compile my programm on each machine via 'mpicc lam_test.c -o
lamtest'
start the program with 'mpirun C -ssi rpi crtcp -ssi cr blcr lamtest'
everything works just fine.
when i checkpoint i use 'lamcheckpoint -cr blcr -pid <pid of mpirun>'
to restart i use 'lamrestart -ssi cr blcr -ssi cr_blcr_context_file
context.mpirun.pid'
i have tried out 'cr_checkpoint' and 'cr_restart' too.

I always get the same error:
"Restart failed: Resource temporarily unavailable"

Its the same when i try to restart a mpi program that ran locally. Maybe I'm
missing something?
BLCR alone works just fine on this machine..

Help would be very much appreciated
Thanks,
Patrick Schend

As i'm still very unused to MPI and C i copied my sourcecode here:

#include "mpi.h"
#include <stdio.h>

int main(argc,argv)
int argc;
char *argv[]; {
int numtasks, zaehler, rank, dest, source, rc, count, tag=1;
int inmsg;
int outmsg;
MPI_Status Stat;

MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);

if (rank == 0) {
zaehler = 0;
dest = 1;
source = 1;
while(zaehler < 10){
printf("Gestartet als Sender mit pid: %d\n",(int)getpid());
rc = MPI_Send(&zaehler, 1, MPI_INT, dest, tag, MPI_COMM_WORLD);
rc = MPI_Recv(&inmsg, 1, MPI_INT, source, tag, MPI_COMM_WORLD, &Stat);
zaehler++;
sleep(5);
}
}

else if (rank == 1) {
dest = 0;
source = 0;
while(zaehler < 10){
printf("Gestartet als Empfänger mit pid: %d\n",(int)getpid());
rc = MPI_Recv(&zaehler, 1, MPI_INT, source, tag, MPI_COMM_WORLD, &Stat);
rc = MPI_Send(&outmsg, 1, MPI_INT, dest, tag, MPI_COMM_WORLD);
printf("zaehler: %d\n",zaehler);
}
}

rc = MPI_Get_count(&Stat, MPI_CHAR, &count);
printf("Task %d: Received %d char(s) from task %d with tag %d \n",
rank, count, Stat.MPI_SOURCE, Stat.MPI_TAG);
fflush(stdout);

MPI_Finalize();
exit(0);

}

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.