Here's a crazy idea -- throw in a "sleep(10);" before you
MPI_Finalize(). This may help on the off chance that the output *is*
being sent properly, but is simply not being displayed because it
arrives *after* all the processes terminate and mpirun terminates.
This is quite unlikely ("impossible" is a Very Big Word for software
developers), but certainly, in a Murphy's Law kind of way, possible.
On Jun 28, 2005, at 11:14 AM, Craig Lam wrote:
> On 6/28/05, Jeff Squyres <jsquyres_at_[hidden]> wrote:
>> On Jun 25, 2005, at 12:05 PM, Craig Lam wrote:
>>
>>> Thank you again for your response. I'm a bit baffled by this myself.
>>> I've been poaring through the source code for lam in an attempt to
>>> understand the stdout redirection, but I'm afraid this will probably
>>> take quiet some time.
>>
>> Yes, unfortunately it's quite twisted and tangled code.
>>
>
> Yeah, I discovered this too. :)
>
>
>>> I've included the output of configure (both the
>>> log and the output to stdout/err piped together to
>>> configure.std_output.log.gz) as attachments to this email. Your most
>>> likely looking for the line "checking fd passing using RFC2292 API...
>>> passed" in the configure output to stdout, which I saw just fine.
>>
>> Ok. There's actually several relevant lines -- we test for all
>> possible fd-passing systems:
>>
>> checking BSD 4.3 for msg_accrights in struct msghdr... no
>> checking for BSD 4.3 fd passing support... no
>> checking for POSIX.1g struct msghdr... yes
>> checking fd passing using RFC2292 API... passed
>> checking for BSD 4.4 fd passing support... yes (RFC2292 API)
>> checking for System V Release 4 for struct strrecvfd... yes
>> checking System V Release 4 fd passing example... failed
>> checking for System V Release 4 fd passing support... no
>>
>> But the end result is the same -- it looks like you have BSD 4.4
>> support (RFC2292). The configure test actually compiles and runs a
>> short test that performs fd passing; if the test passes, your BSD 4.4
>> fd passing *should* be working properly on your machine.
>>
>> Are you running the same version of the OS over your entire cluster?
>>
>
> Yes, same statically compiled Linux 2.6.11 kernel image on each node.
>
>>> Is there any resource that describes how the standard out redirection
>>> occurs in natural language so that I could understand this quickly?
>>
>> Unfortunately, no. But here's a quick breakdown (this is from memory;
>> it's been quite a long time since I've looked at this code, so this
>> may
>> not be 100% accurate, but it's close enough to give you the spirit of
>> what is happening):
>>
>> - lamboot is run and you get a set of LAM daemons (lamd's)
>> - mpirun contacts the local lamd and passes its stdin/out/err file
>> descriptors
>> - mpirun contacts each relevant lamd and tells it to launch your
>> process
>> - for all nodes where mpirun is not run:
>> - before launching, the lamd chains the stdin/out/err to pipes that
>> go into the lamd (i.e., after the fork but before the exec)
>> - each lamd then exec's your process(es)
>> - when information is received on the stdout/err pipes, the lamd
>> forwards the data to the lamd where mpirun is running
>> - for the node where mpirun is running:
>> - before launching, the lamd passes the file descriptors that it
>> received from mpirun to the newly-forked process and dup2's them into
>> stdin/out/err (hence, they write directly to mpirun's stdout/err
>> through normal unix mechanisms)
>> - when the lamd receives remote stdout/err data, it writes it to
>> the
>> file descriptors that it received from mpirun
>>
>> It's quite complicated, actually. :-\
>>
>
> That is invaluble information for anyone trying to debug this type of
> problem, or understand the LAM archietecture in general! Thank you so
> much!!
>
>
> Strangely.. and (seemingly) randomly, stdout seems to be working now.
> As far as I can tell, I didn't do anything different. I'm even more
> perplexed now than I was before. I was running lam jobs with no
> forwarded input from remote nodes all morning, and then I ran lamexec,
> and it seems to be working correctly now. I'd really like to discover
> what the problem was, but, frankly, I'm a bit confused.
>
>
>> So, a few followup questions:
>>
>> - What happens if you mpirun only on the local node? E.g., mpirun -np
>> 1 foo
>> - Does the same behavior happen if you lamexec? E.g., lamexec -np 1
>> uptime (local node only), or lamexec -np 4 uptime (spanning multiple
>> nodes)
>> - Did you confirm that your processes are, indeed, running on your
>> remote nodes? Can you put a "system("date > /tmp/foo");", for
>> example,
>> in your code to ensure that they are actually launched properly on all
>> nodes?
>
> When the strange behavior was exhibiting itself, output always showed
> up from a single node job as it would be run on the node that I
> executed mpirun on. All output from the local node showed up, though
> no output on other nodes showed up. The actual program was running, I
> verified this several different ways (running top, passing messages,
> checking that messages would be passed properly.)
>
> As I said, I'd really like to track this problem down even though it's
> no longer occuring. If anyone has any ideas, please let me know.
>
> Craig
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|