LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: McCalla, Mac (macmccalla_at_[hidden])
Date: 2005-02-09 12:54:54


Hi,
        What is file descriptor 3 (referenced by the write) pointing at
for this process?
 a NFS mounted file perhaps?

HTH,
        Mac McCalla

"I went mad for a while, did me no end of good." -Ford Prefect (from
Life the Universe and Everything...Douglas Adams)

-----Original Message-----
From: lam-bounces_at_[hidden] [mailto:lam-bounces_at_[hidden]] On Behalf
Of Michael Arndt
Sent: Tuesday, February 08, 2005 3:57 PM
To: General LAM/MPI mailing list
Subject: Re: LAM: Problem connecting lamd on Opteron / RH ES 3.0

Hello Anthony,

thanks for your Answer, you are completely right
about the lamnodes issue.
My mail was somewhat misleading.

my "commercial application" is just stopping any output and
the job "hangs" forever.

and here is a strace i did inside a shellwrapper from within
the mpirun command:

uname({sys="Linux", node="cae1", ...}) = 0
stat64(0x40ed8c00, 0xffffc104) = 0
getuid32() = 1000
getcwd("/scratch/lsf.micha.422", 2048) = 23
chdir("/scratch/lsf.micha.422/lam-micha_at_cae1-lsf-422-0") = 0
socketcall(0x1, 0xffffc208) = 3
socketcall(0x3, 0xffffc208) = 0
chdir("/scratch/lsf.micha.422") = 0
socketcall(0xf, 0xffffc278) = 0
socketcall(0xf, 0xffffc278) = 0
getppid() = 9238
rt_sigaction(SIGUSR2, {0x1400000040ecc488, [], SA_NOMASK|0x555eb8},
{SIG_DFL}, 4294950972) = 0
rt_sigprocmask(SIG_BLOCK, [USR2], [], 4294951176) = 0
write(3, "\5\0\0\0\377\377\377\377\27$\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
96) = 96

can anone make out where / why this "hangs" ?

I am irritated since *exactly* the same config runs
prefectly well on another cluster ...

TIA
Micha

_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/