To understand the problem more, I did the following:
I ran View3d_server as usual, i.e.,
> mpirun -np 1 View3d_server
I then ran the client not from mpirun,
> client
Both my client and server crashed with the same error message.
So using Matlab is not the reason, rather starting the client
without using mpirun is.
The following is from Jeff some time ago:
> Hence, you must lamboot before you run any MPI application under LAM.
> You can do this before you run matlab,
Yes, I did lamboot before starting the server and before starting the C
client or mex client.
> So, once you have a LAM universe, you can launch MPI jobs in one of
> three ways:
> 1. "Singleton", where you just "./a.out" (where a.out invokes
> MPI_Init). This will make a.out be an MPI process, and it will have an
> MPI_COMM_WORLD size of 1.
I ran my C client today without using mpirun. I did check
the size of MPI_COMM_WORLD and found it to be 1.
This client process is of rank 0.
> 2. With mpirun.
> 3. With MPI_Comm_spawn[_multiple].
> So what I was suggesting with that design is that you will probably
> lamboot before you run matlab (or you can make your mex script smart
> enough to run lamboot itself), and then have a mex interface that calls
> MPI_Init. This will give you a singleton MPI process, where you can
> look for published names, etc. Then you can spawn a master or connect
> to the existing master... etc.
My experiment today takes the Matlab out of the equation.
Now the question is why I am unable to connect to the server
from an MPI singleton that is run without using mpirun.
Two related questions:
(1) When my client is run from mpirun, but the server is not
started, if my client MPI_Lookup_name, it will crash with
the error:
MPI_Lookup_name: publishing service: name is not published (rank 0,
MPI_COMM_WORLD)
Rank (0, MPI_COMM_WORLD): Call stack within LAM:
Rank (0, MPI_COMM_WORLD): - MPI_Lookup_name()
Rank (0, MPI_COMM_WORLD): - main()
(2) If I control-C the server, obviously the server will not
have a chance to MPI_Unpublish_name. The next time
I start the server, it will crash with the error:
MPI_Publish_name: publishing service: name is published (rank 0,
MPI_COMM_WORLD)
Rank (0, MPI_COMM_WORLD): Call stack within LAM:
Rank (0, MPI_COMM_WORLD): - MPI_Publish_name()
Rank (0, MPI_COMM_WORLD): - main()
I can lamboot again to solve this problem. But is there a way to remove
the left-over published name from with my server code?
Thanks,
-Lei
Lei_at_ICS wrote:
>I tried something crazy -- I ran matlab from mpirun:
>mpirun -np 1 matlab -nodesktop
>Now my mex_client can connect to my MPI server
>without crashing. Wow! :)
>
>This isn't the way I wanted to run matlab; it should not be!
>Indeed, other weird thing happened when I ran matlab from mpirun.
>
>But why did mpirun help in this case? What is the right way
>to start an MPI singleton from matlab via mex?
>
>-------------------- mex_client.c --------------------
> MPI_Init( NULL, NULL );
> strcpy(port_name, "n0:i11:323" );
>
> MPI_Comm_size(MPI_COMM_WORLD, &size);
> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>
> /*
> //status = MPI_Lookup_name("MPI_SERVER1", MPI_INFO_NULL, port_name);
> //if (status != MPI_SUCCESS) {
> //printf("****** MPI Server not up yet. \n");
> //}
> */
>
> MPI_Comm_connect( port_name, MPI_INFO_NULL, 0, MPI_COMM_SELF,
> &server );
>
>
>------------------- View3d_server.c -------------------------
> MPI_Init(&argc, &argv);
> MPI_Comm_size(MPI_COMM_WORLD, &num_proc);
> MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
>
> MPI_Open_port(MPI_INFO_NULL, sport_name);
>
> while (1) {
> MPI_Comm_accept(sport_name, MPI_INFO_NULL, 0, MPI_COMM_SELF,
> &comm_client );
> ... ... ...
> }
>
>Lei_at_ICS wrote:
>
>
>
>>Hi,
>>
>>I have an MPI server which opens a port, prints out the port name,
>>and waits for connection. My client, in C, will use the printed
>>port name as its argument to connect to the server and send
>>a message to the server. The reason I do not use published
>>names is that my call to MPI_Lookup_name() would crash
>>(see my earlier emails; that's a different problem).
>>
>>Now my above server and client work fine until I made
>>the client a MEX function called from Matlab. Now
>>the sever will crash with the following error:
>>losangeles[48]% mpirun -np 1 View3d_server
>>MPI_SERVER available at n0:i11:323
>>**** before MPI_Comm_accept ...
>>MPI_Comm_accept: mismatched run-time flags: Bad address (rank 0,
>>MPI_COMM_WORLD)
>>Rank (0, MPI_COMM_WORLD): Call stack within LAM:
>>Rank (0, MPI_COMM_WORLD): - MPI_Comm_accept()
>>Rank (0, MPI_COMM_WORLD): - main()
>>-----------------------------------------------------------------------------
>>One of the processes started by mpirun has exited with a nonzero exit
>>
>>And the MEX client will crash with the following error:
>>
>>
>>>>mex_client()
>>>>
>>>>
>>*** port_name: n0:i11:323
>>Rank (0, MPI_COMM_WORLD): Call stack within LAM:
>>Rank (0, MPI_COMM_WORLD): - MPI_Comm_connect()
>>Rank (0, MPI_COMM_WORLD): - main()
>>MPI_Comm_connect: unclassified: Too many open files (rank 0, MPI_COMM_WORLD)
>>
>>My client (C or MEX) is very simple and it does not open any files.
>>My LAM (7.1.1) was built with the options:
>>
>>--without-threads --with-memory-manager=none
>>
>>Any suggestions on how to solve this problem? Has anybody
>>actually done this before?
>>
>>Thanks a lot for your help!
>>
>>-Lei
>>
>>
>>
>>
>>_______________________________________________
>>This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
>>
>>
>>
>
>_______________________________________________
>This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
>
|