LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Brian Barrett (brbarret_at_[hidden])
Date: 2006-03-17 09:05:04


On Mar 17, 2006, at 7:46 AM, Alexandre Carissimi wrote:

> Thanks a lot for your very interesting answers. But, now, based
> on them, I´m curious about two thinks (again :)) :
>
> (1) MPI process does TCP connections by demand. I mean, when a
> process A needs do send (or receive) something to/from a
> process B they stablishes a TCP connection. How long this
> connection exists? For example, if I have a simple ping-
> pong. The connection is stablished in the first time and
> all other ping messages flows for this connection or each
> pair send/receive does a new connection?

No, MPI wires up the TCP mesh between processes completely during
MPI_INIT and the connections stay alive for the life of the process.

> (2) Why exactly - if I understood well - exists a pipe between
> MPI process and lamd? It is to support the lam commands
> like lamgrow, lamclean, lamwipe etc? It seems that MPI_init()
> use this pipe.

yes, and things like MPI_COMM_ Spawn, Accept, Connect, etc. that
require some contact info be sent between processes.

> My interest on knowing this details is because we are planning to
> stop MPI applications and launch them on other nodes (same CPU
> architecture and OS, off course) and we were having some problems.
> It seems that pipe info is stored inside the checkpointing file
> and this is breaking our migration process. On the same nodes
> I can stop (checkpoint) and restart MPI process without problems.

You should take a look at our papers and documentation on the
checkpoint/restart facilities in LAM (which is actually on the
web...). Currently, it doesn't support just migrating one process
(you have to checkpoint/restart the *entire* job if any one process
is restarted), but it might very well be a good starting point. We
have a component infrastructure for checkpointers and currently have
two implementations - one for the Berkeley Labs Checkpoint/Restart
library and another for what we call "self", where we call a callback
provided by the user to do the actual checkpointing.

Hope this helps,

Brian

> Brian Barrett wrote:
>> On Mar 16, 2006, at 5:55 PM, Alexandre Carissimi wrote:
>>
>>
>>> I was looking for the paper:
>>>
>>> Brian Barret, Jeff Sqyres, Andrew Lumsdaine. LAM/MPI Design
>>> Document. Open Systems laboratory. Pervasive Technology Labs.
>>> Indiana University.
>>>
>>> Mentionned on some LAM publications but I couldn't find it.
>>
>>
>> You're right, we don't have such a document on the web page. I
>> believe that we decided that the documentation was not up-to-date, so
>> it was pulled. I will try to see if I can find the document, but
>> thus far I've failed.
>>
>>
>>> I would like to answer two questions about LAM RTE (Run Time
>>> Environment):
>>>
>>> (1) At lamboot command, a set of n lambd deamons are started
>>> on nodes described on hostfile defining a multiprocessor
>>> virtual machine (isn´t it?). My question is: the lamd
>>> stablishes a fully connected mesh among them? This is
>>> done using TCP connections?
>>
>>
>> At startup, contact information is shared between all lamd
>> processes. Communication between lamd processes is over UDP, which
>> means that we don't have to do fully connected meshes. When new lamd
>> processes are started (through lamgrow), the new processes share
>> their contact info with all other existing processes.
>>
>>
>>> (2) A MPI process communicates with another MPI process using
>>> lamd as intermediate element? I mean a MPI process does or
>>> not a TCP connection with another MPI on remote (even local)
>>> node? Each MPI process communicate with lamd using a unix
>>> pipe and lamd communicates among then using TCP ? Is this
>>> correct?
>>
>>
>> It is possible (but not the default) for MPI applications to use the
>> lamd communication channel for MPI communication. The default,
>> however, is to use a direct connection between MPI process. LAM/MPI
>> currently supports transfer over mixed shared memory and tcp, pure
>> tcp, Myrinet/GM, and InfiniBand.
>>
>>
>>> In fact, I have a third question: when I use the Checkpointing
>>> Restart support, mpirun loads two additional modules: CRLAM and
>>> CRMPI. These modules coordinates their behavior among the nodes
>>> using UDP or TCP? They make another TCP connections pairs dedicated
>>> to this function or they communicates using lamd?
>>
>>
>> The CR modules coordinate behavior over the out-of-band communication
>> channel provided by the lamds, so data is eventually transfered over
>> UDP.
>>
>>
>>> If someone could help me to answer theses questions or giving me
>>> pointers to it, I´ll appreciate. For the moment, I´m a little bit
>>> in rush 'to deep inside" MPI sources to look for these details. Any
>>> hits will be helpful.
>>
>>
>> Let us know if you have any other questions.
>>
>> Brian
>>
>>
>
>
> --
> ___________________________________________________________________
> CARISSIMI, Alexandre Universidade Federal do Rio Grande do Sul
> asc_at_[hidden] Instituto de Informática
> Tel: +55.51.33.16.61.69 Caixa Postal 15064
> Fax: +55.51.33.16.73.08 CEP:91501-970 Porto Alegre - RS - Brasil
> ___________________________________________________________________
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/