LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2005-06-28 08:19:54


On Jun 25, 2005, at 12:05 PM, Craig Lam wrote:

> Thank you again for your response. I'm a bit baffled by this myself.
> I've been poaring through the source code for lam in an attempt to
> understand the stdout redirection, but I'm afraid this will probably
> take quiet some time.

Yes, unfortunately it's quite twisted and tangled code.

> I've included the output of configure (both the
> log and the output to stdout/err piped together to
> configure.std_output.log.gz) as attachments to this email. Your most
> likely looking for the line "checking fd passing using RFC2292 API...
> passed" in the configure output to stdout, which I saw just fine.

Ok. There's actually several relevant lines -- we test for all
possible fd-passing systems:

checking BSD 4.3 for msg_accrights in struct msghdr... no
checking for BSD 4.3 fd passing support... no
checking for POSIX.1g struct msghdr... yes
checking fd passing using RFC2292 API... passed
checking for BSD 4.4 fd passing support... yes (RFC2292 API)
checking for System V Release 4 for struct strrecvfd... yes
checking System V Release 4 fd passing example... failed
checking for System V Release 4 fd passing support... no

But the end result is the same -- it looks like you have BSD 4.4
support (RFC2292). The configure test actually compiles and runs a
short test that performs fd passing; if the test passes, your BSD 4.4
fd passing *should* be working properly on your machine.

Are you running the same version of the OS over your entire cluster?

> Is there any resource that describes how the standard out redirection
> occurs in natural language so that I could understand this quickly?

Unfortunately, no. But here's a quick breakdown (this is from memory;
it's been quite a long time since I've looked at this code, so this may
not be 100% accurate, but it's close enough to give you the spirit of
what is happening):

- lamboot is run and you get a set of LAM daemons (lamd's)
- mpirun contacts the local lamd and passes its stdin/out/err file
descriptors
- mpirun contacts each relevant lamd and tells it to launch your process
- for all nodes where mpirun is not run:
   - before launching, the lamd chains the stdin/out/err to pipes that
go into the lamd (i.e., after the fork but before the exec)
   - each lamd then exec's your process(es)
   - when information is received on the stdout/err pipes, the lamd
forwards the data to the lamd where mpirun is running
- for the node where mpirun is running:
   - before launching, the lamd passes the file descriptors that it
received from mpirun to the newly-forked process and dup2's them into
stdin/out/err (hence, they write directly to mpirun's stdout/err
through normal unix mechanisms)
   - when the lamd receives remote stdout/err data, it writes it to the
file descriptors that it received from mpirun

It's quite complicated, actually. :-\

So, a few followup questions:

- What happens if you mpirun only on the local node? E.g., mpirun -np
1 foo
- Does the same behavior happen if you lamexec? E.g., lamexec -np 1
uptime (local node only), or lamexec -np 4 uptime (spanning multiple
nodes)
- Did you confirm that your processes are, indeed, running on your
remote nodes? Can you put a "system("date > /tmp/foo");", for example,
in your code to ensure that they are actually launched properly on all
nodes?

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/