On Thu, 19 Jun 2003, Andrey Slepuhin wrote:
> [snipped]
> The same socket. The main idea behind my question is that most MPI
> applications (especially mesh-based) do some computations, than
> MPI_Barrier(), than data exchange, so interprocess communications are
> not spreaded in time, but are done synchronously and this is a bottle
> neck.
So your main concern is to optimize the latency between MPI processes.
Correct?
> [snipped]
> Really what I want to have is something like this (in lam-bhost.def):
Minor note: not necessarily lam-bhost.def, but whatever application schema
is used (i.e., even if the user provides one on the command line).
> ...
> node-1 cpu=2 (192.168.0.1 192.168.0.2)
> node-2 cpu=2 (192.168.0.3 192.168.0.4)
> ...
In 7.0, this might not be too hard.
Sidenote: I would strongly advocate working with the 7.0 code tree
since:
a) it's a bit different (read: better organized) than the 6.5 tree,
espectially w.r.t. the RPI code,
b) the RPI is much more modular in the 7.0 tree, and
c) the 6.5.x tree will likely be retired in the not-distant future.
See http://www.lam-mpi.org/cvs/ for details on how to get an anonymous
CVS checkout.
Off the top of my head, here's what I see would need to be done:
- make a new attribute in the boot schema (e.g., "addresses") to put in
all IP addresses. For example:
node1 cpu=2 addresses="192.168.0.1 192.168.0.2"
node2 cpu=2 addresses="192.168.0.3 192.168.0.4"
...
Without going into details, doing it this way makes the data
available throughout the LAM code base -- arbitrary key=value pairs
are cached on the boot schema data (this is new in 7.0; does not
exist in 6.5.x). So there's no code involved in this step -- just
deciding on the key name.
- in the TCP RPI (and both the shmem RPI's), there is a static
function named connect_all() that takes care of connecting to new
procs (both during MPI_INIT and MPI_COMM_SPAWN*). This is the
function that you'll want to modify. It uses a "dance" algorithm to
make the connections, something along these lines:
open listening TCP socket
foreach other_mpi_process
if already_connected
continue
if my_id < other_process_id
send listening socket IP port to other process (**)
accept()
else
receive IP port number from other process (**)
connect()
close listening TCP socket
The two (**) steps are done with LAM's out-of-band communication
mechanism using the function calls nsend() and nrecv() (see their
respective man pages).
I think you would need to modify this function to something like the
following:
if (have_address_key_in_boot_schema)
open listening TCP socket on all addresses
make mapping of who should connect on which socket
else
open listening TCP socket on default address
foreach other_mpi_process
if already_connected
continue
if my_id < other_process_id
send listening socket IP address and port to other process (**)
accept()
else
receive IP address and port number from other process (**)
connect()
close listening TCP socket(s)
i.e., if the "address" key is present, open multiple TCP sockets and
then decide who is going to connect() on which socket.
Modifying this section should be sufficient; the rest of the TCP
progress engine doesn't know or care what IP address it's connected to
-- it just uses the sockets that were opened in connect_all().
Does that help?
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|