LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Pierre Valiron (Pierre.Valiron_at_[hidden])
Date: 2005-09-01 04:43:09


Bogdan Costescu wrote:

>On Tue, 30 Aug 2005, Pierre Valiron wrote:
>
>
>
>>This behaviour is very annoying for scripting batch jobs.
>>
>>
>
>I have used a similar approach (lamboot immediately followed by mpirun
>then lamhalt) in a wrapper script that executed under SGE, Torque
>(using their native start-up mechanisms) and with simple rsh/ssh and
>never encountered such a problem.
>
>>From the error message, I understand that it's mpirun that fails
>somehow to start the job; the daemons should be properly started at
>that point, otherwise I think that the error message would be
>different.
>
>Can you try using the -s option of mpirun ? This makes mpirun not to
>rely on NFS (or whatever shared FS you are using) to provide the
>program, but copies it itself from the first node. It is mentioned in
>the mpirun man page and I have experienced it myself with NFS that if
>the program is freshly produced (as a result of a compile/link
>process), there might be errors trying to execute it immediately.
>Another way to prove this is to not execute your program directly but
>wrap it with a shell script that does some 'echo' then execs your
>program - if you get the echos from all nodes, it means that mpirun
>did try to start the job on all nodes and it's the program itself that
>doesn't run properly to reach MPI_Init.
>
>
Well, I finally found the problem was related to the behaviour of MPI_INIT.
The code snippet below is buggy when started ever many nodes and procs:

      call MPI_Init(err)
      call MPI_Comm_rank(MPI_COMM_WORLD,me,err)
      call MPI_Comm_size(MPI_COMM_WORLD,nprocs,err)
      (some work)
      call MPI_Finalize(err)
      end

If I include
      call MPI_Barrier(MPI_COMM_WORLD,err)
right after MPI_Init, all problems disappear.

I could not exactly what has been cured by the MPI_Barrier call. Fix a
wong MPI_Comm_rank or MPI_Comm_size, or a not fully functional MPI
environment, hard to say as one process dies before writing anything...
Using mpirun -s reduces the occurence of the bug, but does not provide a
cure. For some unknown reason, adding a sleep after lamboot also helps.

Very strange.

Pierre.

>
>
>>We are very proud of our fast OAR batch system, which starts a 100
>>proc job in a second, and we don't want to introduce unneeded
>>delays.
>>
>>
>
>I never heard of OAR, so thanks for mentioning it !
>
>
>

-- 
Soutenez le mouvement SAUVONS LA RECHERCHE :
http://recherche-en-danger.apinc.org/
       _/_/_/_/    _/       _/       Dr. Pierre VALIRON
      _/     _/   _/      _/   Laboratoire d'Astrophysique
     _/     _/   _/     _/    Observatoire de Grenoble / UJF
    _/_/_/_/    _/    _/    BP 53  F-38041 Grenoble Cedex 9 (France)
   _/          _/   _/    http://www-laog.obs.ujf-grenoble.fr/~valiron/
  _/          _/  _/     Mail: Pierre.Valiron_at_[hidden]
 _/          _/ _/      Phone: +33 4 7651 4787  Fax: +33 4 7644 8821
_/          _/_/