Hi All,
I am Abhik Sarkar.Currently am working on a beowulf cluster
with Lam-6.5.6 running.For a while i had been trying to use the MPI
using threads such that the main thread implements certain decoding
operation and as in when it requires a data from other process, it
prompts the child thread(POSIX) to do the communication.A simple code
implements the scenario where 2 processes on a single node generate
their corresponding threads with one of the processes through its
thread does the MPI_Send and the second MPI_Probe followed by
MPI_Recv.All calls to the LAM has been from the child
threads.Apparently, MPI_Send() is working fine but MPI_Recv() is
giving problem.
The following is the code.
--------------------------
#include<stdio.h>
#include<pthread.h>
#include<mpi.h>
#include<curses.h>
#include<string.h>
#include<unistd.h>
char c;
int rank;
struct procnum{
int *argc1;
char ***argv1;
};
void * thrdprobe(void *t)
{ int len;
char *s;
MPI_Status stat;
int ret;
MPI_Init(((struct procnum *)t)->argc1,((struct procnum *)t)->argv1);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
printf("my rank is %d\n",rank);
if(rank==0)
{
s="abhik";
len=strlen(s);
MPI_Send(s,len+1,MPI_CHAR,1,17,MPI_COMM_WORLD);
printf("sent\n");
}
else
{
MPI_Probe(MPI_ANY_SOURCE,MPI_ANY_TAG,MPI_COMM_WORLD,&stat);
printf("the return stat is done\n");
MPI_Recv(s,10,MPI_CHAR,0,17,MPI_COMM_WORLD,&stat);
printf("my name is %s\n",s);
}
MPI_Barrier(MPI_COMM_WORLD);
printf("barrier passed\n");
MPI_Finalize();
return(0);
}
int main(int argc,char *argv[])
{
pthread_t thr_id;
struct procnum s;
void *thrd_stat;
s.argc1=&argc;
s.argv1=&argv;
if(!pthread_create(&thr_id,NULL,thrdprobe,(void *)&s))
printf("created a thread\n");
else
perror("pthread_create\n");
pthread_join(thr_id,&thrd_stat);
printf("killing thread\n");
return(0);
}
The following is the output with errors on standard I/O.
---------------------------------------------------------
created a thread
created a thread
my rank is 0
sent
my rank is 1
the return stat is done
Rank (1, MPI_COMM_WORLD): Call stack within LAM:
Rank (1, MPI_COMM_WORLD): - MPI_Recv()
Rank (1, MPI_COMM_WORLD): - main()
MPI process rank 1 (n0, p1193) caught a SIGSEGV in MPI_Recv.
---------------------------------------------------------------------------
--
One of the processes started by mpirun has exited with a nonzero exit
code. This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.
PID 1188 failed on node n0 with exit status 1.
---------------------------------------------------------------------------
--
we urgently need help.....and plz reply with respect to lam-6.5.6 only.
with Regards
|