On Thu, Jul 17, 2003 at 03:35:51PM +0530, abhik.sarkar_at_[hidden] wrote:
> Hi All,
>
> I am Abhik Sarkar.Currently am working on a beowulf cluster
> with Lam-6.5.6 running.For a while i had been trying to use the MPI
> using threads such that the main thread implements certain decoding
> operation and as in when it requires a data from other process, it
> prompts the child thread(POSIX) to do the communication.A simple code
> implements the scenario where 2 processes on a single node generate
> their corresponding threads with one of the processes through its
> thread does the MPI_Send and the second MPI_Probe followed by
> MPI_Recv.All calls to the LAM has been from the child
> threads.Apparently, MPI_Send() is working fine but MPI_Recv() is
> giving problem.
>
> The following is the code.
> --------------------------
>
> #include<stdio.h>
> #include<pthread.h>
> #include<mpi.h>
> #include<curses.h>
> #include<string.h>
> #include<unistd.h>
>
> char c;
> int rank;
> struct procnum{
> int *argc1;
> char ***argv1;
> };
>
> void * thrdprobe(void *t)
> { int len;
> char *s;
> MPI_Status stat;
> int ret;
> MPI_Init(((struct procnum *)t)->argc1,((struct procnum *)t)->argv1);
> MPI_Comm_rank(MPI_COMM_WORLD,&rank);
>
> printf("my rank is %d\n",rank);
> if(rank==0)
> {
> s="abhik";
> len=strlen(s);
> MPI_Send(s,len+1,MPI_CHAR,1,17,MPI_COMM_WORLD);
> printf("sent\n");
> }
> else
> {
> MPI_Probe(MPI_ANY_SOURCE,MPI_ANY_TAG,MPI_COMM_WORLD,&stat);
> printf("the return stat is done\n");
> MPI_Recv(s,10,MPI_CHAR,0,17,MPI_COMM_WORLD,&stat);
>
> printf("my name is %s\n",s);
> }
> MPI_Barrier(MPI_COMM_WORLD);
> printf("barrier passed\n");
> MPI_Finalize();
> return(0);
> }
>
> int main(int argc,char *argv[])
> {
> pthread_t thr_id;
> struct procnum s;
> void *thrd_stat;
> s.argc1=&argc;
> s.argv1=&argv;
>
> if(!pthread_create(&thr_id,NULL,thrdprobe,(void *)&s))
>
> printf("created a thread\n");
> else
> perror("pthread_create\n");
> pthread_join(thr_id,&thrd_stat);
> printf("killing thread\n");
> return(0);
> }
>
>
>
>
> The following is the output with errors on standard I/O.
> ---------------------------------------------------------
>
> created a thread
> created a thread
> my rank is 0
> sent
> my rank is 1
> the return stat is done
> Rank (1, MPI_COMM_WORLD): Call stack within LAM:
> Rank (1, MPI_COMM_WORLD): - MPI_Recv()
> Rank (1, MPI_COMM_WORLD): - main()
> MPI process rank 1 (n0, p1193) caught a SIGSEGV in MPI_Recv.
> ---------------------------------------------------------------------------
> --
>
> One of the processes started by mpirun has exited with a nonzero exit
> code. This typically indicates that the process finished in error.
> If your process did not finish in error, be sure to include a "return
> 0" or "exit(0)" in your C code before exiting the application.
> PID 1188 failed on node n0 with exit status 1.
> ---------------------------------------------------------------------------
> --
>
> we urgently need help.....and plz reply with respect to lam-6.5.6 only.
Abhik, the problem is a simple C coding error. In rank 1 the pointer s
is uninitialized. It needs to be initialized to reference a buffer large
enough to receive the message being sent by rank 0, e.g. s = malloc(10);
-nick
|