LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: McCalla, Mac (macmccalla_at_[hidden])
Date: 2003-07-23 16:45:53


Hello,

Our lam environment is 6.5.9. I have a user running lam-mpi job on a
40 node beowolf cluster. Today his job has encountered 4 failures
As follows:

at 10:58 ....."lamboot -v -s lamboot.mpi" command received
         PBIND-------------------------------------
         LAM failed to execute a LAM binary on the remote node beo132.

(lamhalt was then successfully executed)
Job was restarted by the user at step1 (lamboot).

At 12:00 step2 ---> " mpirun -lamd -w -pty -O -x
NJS_WORKDIR,NJS_STEPNAME xa=-40_migrate.apps" command received
         YPBINDPROC_DOMAIN: Domain not bound

Job restarted by user at step 2:

At 12:33 step 29 ---> " mpirun -lamd -w -pty -O -x
NJS_WORKDIR,NJS_STEPNAME xa=-35_image_reduce.apps"
Command received
          Mpirun: cannot start /u/morton/fxmig/test/run_dv_reduce_bcast
on n35: invalid address tag

Lamhalt was executed successfully.
Lamboot was executed manually by user.

Job was restarted by the user at step 29 .

At 12:47 step 84 ---> " mpirun -lamd -w -pty -O -x
NJS_WORKDIR,NJS_STEPNAME xa=-24_image_reduce.apps"
Command received
          Mpirun: cannot start /u/morton/fxmig/test/run_dv_reduce_bcast
on n3: invalid address tag

Not clear that lamhalt was executed.
Lamboot was executed manually by user.

Job was restarted by the user at step 84.

At 13:52 step 109 ---> " mpirun -lamd -w -pty -O -x
NJS_WORKDIR,NJS_STEPNAME xa=-19_image_reduce.apps"
Command received
        Mpirun: cannot start /u/morton/fxmig/test/run_dv_reduce_bcast on
n35: invalid address tag

Lamhalt was executed.
Lamboot was executed manually by user.

Job was restarted by the user at step 109.

=========================================================
Are the PBIND and YPBINDPROC errors related? Could they be related to
the invalid address tag
Messages? What do the invalid address tag messages mean? Is there
some diagnostic technique we could use to trap what causes the invalid
address tag message?

Thanks for your time.

Mac mccalla
Amerada Hess corp.
Houston, Tx